Skip to content

PVAS-Development/whisperdesk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

🎙️ WhisperDesk

Downloads Release Version Lint Stars Forks Good First Issue Contributions Welcome

A beautiful, native macOS desktop application for transcribing audio and video files using whisper.cpp.

🌐 Visit our website | 📥 Download Latest Release

WhisperDesk Screenshot

✨ Features

  • Drag & Drop - Drag single or multiple files to create a batch queue
  • Batch Processing - Process unlimited files sequentially with automatic queue management
  • Multiple Formats - Supports MP3, WAV, M4A, FLAC, OGG, WMA, AAC, AIFF, MP4, MOV, AVI, MKV, WebM, WMV, FLV, M4V
  • Multiple Models - Choose from tiny, base, small, medium, large-v3, or large-v3-turbo Whisper models (including English-only variants)
  • Output Formats - Export as VTT subtitles, SRT subtitles, plain text, Word (.docx), PDF, or Markdown
  • Language Support - Auto-detect or select from 12+ languages
  • Apple Silicon Optimized - Native Metal GPU acceleration on M1/M2/M3/M4 Macs
  • Dark Mode - Beautiful dark theme that respects your system preference
  • Auto Updates - Automatic update notifications when new versions are available
  • Keyboard Shortcuts - Full keyboard navigation support
  • Transcription History - Keep track of your recent transcriptions
  • Native Performance - Uses whisper.cpp for fast, efficient transcription
  • TypeScript - Fully typed codebase for better maintainability
  • Feature-Driven Architecture - Modular codebase organized by feature domains

📋 Requirements

  • macOS 10.15 (Catalina) or later
  • FFmpeg (Required for audio processing)
  • ~500MB disk space (for whisper.cpp and models)

Note: WhisperDesk requires FFmpeg to process audio files. The app will check for it on startup and guide you if it's missing.

🚀 Installation

1. Install Prerequisites

WhisperDesk requires FFmpeg to be installed on your system to process audio and video files.

Install Homebrew (if not already installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install FFmpeg

brew install ffmpeg

Install CMake (for building whisper.cpp)

brew install cmake

2. Install WhisperDesk

Option A: Download DMG (Recommended)

  1. Download the latest WhisperDesk-x.x.x.dmg from Releases
  2. Open the DMG file
  3. Drag WhisperDesk to your Applications folder
  4. Important: Ensure you have FFmpeg installed (see Prerequisites)
  5. Launch WhisperDesk from Applications

Option B: Build from Source

# Clone the repository
git clone https://github.com/PVAS-Development/whisperdesk.git
cd whisperdesk

# Install dependencies
npm install

# Build whisper.cpp with Metal support (downloads base model)
# For development (current architecture only):
npm run setup:whisper

# For production (universal binary - Intel + Apple Silicon):
npm run setup:whisper:universal

# Run in development mode
npm run electron:dev

# Or build for production (automatically builds universal binary)
npm run electron:build

🎮 Usage

  1. Open Files - Drag and drop audio/video files (single or batch) into the app, or click to browse
  2. Configure Settings - Choose your preferred model, language, and output format
  3. Transcribe - Click "Transcribe" to process the entire queue sequentially
  4. Save/Copy - Save the transcription from the save dialog (choose from .txt, .docx, .pdf, .md, .srt, or .vtt formats) or copy to clipboard

Keyboard Shortcuts

Shortcut Action
Cmd+O Open file
Cmd+S Save transcription
Cmd+C Copy transcription
Cmd+Return Start transcription
Cmd+H Toggle history
Escape Cancel transcription

🧠 Whisper Models

Model Size Speed Quality Best For
tiny 75 MB ~10x ★☆☆☆☆ Quick drafts, testing
base 142 MB ~7x ★★☆☆☆ Fast transcription
small 466 MB ~4x ★★★☆☆ Balanced speed/quality
medium 1.5 GB ~2x ★★★★☆ High quality
large-v3 3.1 GB ~1x ★★★★★ Best quality
large-v3-turbo 1.6 GB ~2x ★★★★★ Fast + quality

English-only variants (.en) are available for tiny, base, small, and medium models.

Models are downloaded automatically on first use and cached in:

  • Development: PROJECT_ROOT/models/
  • Production: ~/Library/Application Support/WhisperDesk/models/

🔧 Development

Prerequisites

  • Node.js 22.12+ (use nvm use to auto-switch via .nvmrc)
  • CMake (for building whisper.cpp)
  • FFmpeg

Setup

# Clone and install
git clone https://github.com/PVAS-Development/whisperdesk.git
cd whisperdesk
npm install

# Build whisper.cpp and download base model
npm run setup:whisper

# Run development server
npm run electron:dev

Building

# Build for macOS
npm run electron:build:mac

# Build directory only (faster, for testing)
npm run electron:build:dir

Contributing

This project uses conventional commits for consistent commit messages.

Development Flow

  1. Create a feature branch from main:

    git checkout -b feat/my-feature
  2. Make changes with conventional commits and create PR to main

Commit Message Convention

Use Conventional Commits for clear history:

Commit Type Example Description
feat: feat: add PDF export New feature
fix: fix: crash on startup Bug fix
perf: perf: faster loading Performance improvement
refactor: refactor: simplify logic Code refactoring
docs: docs: update README Documentation
chore: chore: update deps Maintenance
style: style: format code Code style
test: test: add unit tests Tests
ci: ci: fix workflow CI/CD changes
build: build: update config Build changes

Issues & Discussions

  • Bug report? Open an issue via the built-in bug report template so we collect macOS version, WhisperDesk version, reproduction steps, and relevant logs automatically.
  • Feature idea? Start a thread in Discussions. We prefer to explore new ideas there and will only create an issue once we understand the scope.
  • Before you post: search the existing issues and discussions to avoid duplicates and help us respond faster.

Testing

WhisperDesk has a comprehensive test suite with 335+ tests covering utilities, services, hooks, and React components.

Run Tests

# Run all tests once (CI mode)
npm run test:run

# Run tests with watch mode
npm run test

# Run tests with UI dashboard
npm run test:ui

# Run tests with coverage
npm run test:coverage

Test Coverage

  • Unit Tests - Utilities, formatters, validators, storage, and services
  • Component Tests - SettingsPanel, FileDropZone, OutputDisplay
  • Service Tests - Electron API, transcription service, model service, history storage
  • Test Framework: Vitest with jsdom
  • Component Testing: @testing-library/react
  • Pre-commit Hooks - Lint and format checks run automatically before every commit (via husky + lint-staged)

CI/CD Pipeline

Tests run automatically in GitHub Actions on every PR and push:

  • ✅ Linting & formatting checks
  • ✅ TypeScript type checking
  • ✅ Unit & component tests (335+ tests)
  • ✅ Production build validation

Available Scripts

Script Description
npm run dev Start Vite dev server
npm run electron:dev Start app in development mode
npm run setup:whisper Build whisper.cpp (current architecture)
npm run setup:whisper:universal Build whisper.cpp (universal binary)
npm run electron:build Builds macOS DMG (with universal binary)
npm run electron:build:mac Builds macOS DMG (with universal binary)
npm run electron:build:dir Build directory only (faster, testing)
npm run icons Generate app icons from SVG
npm run lint Run ESLint
npm run lint:fix Run ESLint with auto-fix
npm run typecheck Run TypeScript type checking
npm run format Format code with Prettier
npm run format:check Check code formatting
npm run test Run tests with watch mode
npm run test:ui Run tests with dashboard UI
npm run test:run Run tests once (CI mode)
npm run test:coverage Run tests with coverage report

Architecture

This project follows a modern Electron architecture with strict separation of concerns:

  • src/main/: Electron Main process (TypeScript). Handles OS integration, window management, and native services.
  • src/preload/: Preload scripts (TypeScript). Exposes a secure, typed API to the renderer via contextBridge.
  • src/renderer/: React application (TypeScript). The UI layer, built with Vite.
  • src/shared/: Shared types and constants used by both Main and Renderer processes.

Security Features:

  • Context Isolation: Enabled. Renderer cannot access Node.js primitives directly.
  • Sandbox: Enabled. Renderer runs in a sandboxed environment.
  • IPC: All communication happens via typed IPC channels defined in src/main/ipc/.

Project Structure

whisperdesk/
├── src/
│   ├── main/                # Electron Main process (TypeScript)
│   │   ├── index.ts         # Entry point
│   │   ├── ipc/             # IPC Handlers
│   │   ├── services/        # Business logic (Whisper, FileSystem)
│   │   └── utils/           # Utilities
│   ├── preload/             # Preload scripts (TypeScript)
│   │   └── index.ts         # Secure API exposure
│   ├── renderer/            # React frontend (TypeScript)
│   │   ├── App.tsx          # Main app component
│   │   ├── main.tsx         # React entry point
│   │   ├── components/      # Shared UI components
│   │   ├── features/        # Feature-based modules
│   │   └── ...
│   └── shared/              # Shared types and constants
├── dist-electron/           # Output folder for main process build
├── dist/                    # Output folder for renderer build
├── scripts/                 # Build and setup scripts
├── bin/                     # whisper-cli binary (built)
└── models/                  # Downloaded GGML models (dev)

🐛 Troubleshooting

"whisper.cpp not found" error

Run the setup script to build whisper.cpp:

npm run setup:whisper

"FFmpeg not found" error

Install FFmpeg via Homebrew:

brew install ffmpeg

Slow transcription

  • Use a smaller model (tiny or base) for faster results
  • Ensure you're using GPU acceleration (shown in app settings)
  • Close other resource-intensive applications

App won't open (macOS Gatekeeper)

The app is code-signed and notarized by Apple, so it should open normally. If you still see a warning:

  1. Right-click the app and select "Open"
  2. Click "Open" in the dialog that appears

For builds from source (unsigned), you may need to run:

xattr -cr /Applications/WhisperDesk.app

🤝 Contributing

Contributions are welcome! Please see our Contributing Guide for details on:

  • Setting up your development environment
  • Code style and commit conventions
  • Submitting pull requests

🔒 Privacy & Security

  • Local Processing: All audio/video processing happens locally on your device. Your files never leave your computer.
  • No Cloud Uploads: We do not upload your media files or transcriptions to any server.
  • Anonymous Analytics: We collect minimal, anonymous usage data (e.g., app launches, feature usage) to improve the app. No personal data or file content is collected.
  • Code Signing: The app is code-signed and notarized by Apple for your safety.

For more details, please read our Privacy Policy.

☕ Support the Project

WhisperDesk is free and open-source software. If you find it useful, please consider supporting its development:

Your support helps cover the costs of Apple Developer Program fees and keeps the project alive!

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments


Made with ❤️ for the transcription community

About

A beautiful, native macOS desktop application for transcribing audio and video files using whisper.cpp

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published

Languages