Skip to content

Conversation

@obvirm
Copy link

@obvirm obvirm commented Dec 30, 2025

jfk_final.mp4

& '.\whisper-cpp-dtw - Copy\build\bin\Release\whisper-cli.exe' -m .\whisper-cpp-dtw\models\ggml-base.en.bin -f sample_audio.wav --dtw base.en --no-flash-attn --max-len 1 --output-srt -of test_copy

  • Add phoneme-based onset shift (vowels/plosives 150ms, consonants 80ms)
  • Enforce adaptive minimum token duration based on text length
  • Cap maximum duration to prevent stretching over silences
  • Sync segment boundaries with corrected token timestamps

Improves DTW timestamp accuracy for word-level subtitle generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant