skill-kokoro-tts-tool
Local text-to-speech using Kokoro TTS
SKILL.md
| Name | skill-kokoro-tts-tool |
| Description | Local text-to-speech using Kokoro TTS |
name: skill-kokoro-tts-tool description: Local text-to-speech using Kokoro TTS
When to use
- When you need to convert text to speech locally (no API keys)
- When you need to generate audio from long documents (books, articles)
- When you need seamless audiobook rendering without pop artifacts
- When you need fast offline TTS rendering (20-50x real-time)
kokoro-tts-tool Skill
Purpose
This skill provides access to the kokoro-tts-tool CLI for local text-to-speech synthesis using the Kokoro-82M model. Runs entirely on-device with ONNX runtime, optimized for Apple Silicon.
When to Use This Skill
Use this skill when:
- Converting text to speech without cloud APIs
- Generating audio from markdown/text documents
- Creating audiobooks from long-form content
- Needing 60+ voices across 8 languages
Do NOT use this skill for:
- Cloud-based TTS services
- Real-time voice conversion
- Speech-to-text (transcription)
CLI Tool: kokoro-tts-tool
Local text-to-speech CLI using Kokoro-82M (82 million parameters).
Installation
# Clone and install
git clone https://github.com/dnvriend/kokoro-tts-tool.git
cd kokoro-tts-tool
uv tool install .
Prerequisites
- Python 3.14+
- uv package manager
- Apple Silicon Mac (recommended)
Quick Start
# Initialize (downloads ~350MB models)
kokoro-tts-tool init
# Synthesize text to speakers
kokoro-tts-tool synthesize "Hello world"
# Save to file
kokoro-tts-tool synthesize "Hello" --output speech.wav
# Stream a document
kokoro-tts-tool infinite --input book.md
Progressive Disclosure
<details> <summary><strong>📖 Core Commands (Click to expand)</strong></summary>init - Download TTS Models
Downloads the Kokoro ONNX model (~300MB) and voice embeddings (~50MB).
Usage:
kokoro-tts-tool init [OPTIONS]
Options:
--force,-f: Re-download models even if they exist
Examples:
# Download models (skips if already present)
kokoro-tts-tool init
# Force re-download
kokoro-tts-tool init --force
synthesize - Convert Text to Speech
Synthesizes text using the Kokoro TTS model. Audio can be played through speakers or saved to file.
Usage:
kokoro-tts-tool synthesize [TEXT] [OPTIONS]
Arguments:
TEXT: Text to synthesize (optional if using --stdin)
Options:
--stdin,-s: Read text from stdin--voice,-v VALUE: Voice ID (default: af_heart)--output,-o PATH: Save to WAV file--speed FLOAT: Speech speed 0.5-2.0 (default: 1.0)--silence INT: Trailing silence in ms (default: 200)
Examples:
# Play text with default voice
kokoro-tts-tool synthesize "Hello world"
# Use different voice
kokoro-tts-tool synthesize "Hello" --voice am_adam
# Save to file
kokoro-tts-tool synthesize "Hello" --output speech.wav
# Read from stdin
echo "Hello world" | kokoro-tts-tool synthesize --stdin
# Adjust speed
kokoro-tts-tool synthesize "Hello" --speed 1.5
# Multiple options
cat article.txt | kokoro-tts-tool synthesize --stdin \
--voice bf_emma \
--output article.wav \
--speed 0.9
Output: Audio played through speakers (default) or saved as WAV file (24kHz, mono, 16-bit).
infinite - Stream Long Documents
Reads markdown or plain text, splits intelligently into chunks, and streams to speakers or renders to file.
Usage:
kokoro-tts-tool infinite [OPTIONS]
Options:
--input,-i PATH: Input text/markdown file--stdin,-s: Read text from stdin--output,-o PATH: Save to WAV file (fast offline mode)--voice VALUE: Voice ID (default: af_heart)--speed FLOAT: Speech speed 0.5-2.0 (default: 1.0)--chunk-size INT: Target words per chunk 50-1000 (default: 200)--pause INT: Pause between chunks in ms 0-2000 (default: 150)--no-markdown: Treat input as plain text
Examples:
# Stream to speakers
kokoro-tts-tool infinite --input book.md
# Render to WAV (fast, ~2-3min for 1hr audio)
kokoro-tts-tool infinite --input book.md --output audiobook.wav
# Pipe from stdin
cat chapter.md | kokoro-tts-tool infinite --stdin
# With custom voice and speed
kokoro-tts-tool infinite --input notes.md \
--voice am_adam \
--speed 1.2
# Render audiobook with narrator voice
kokoro-tts-tool infinite --input book.md \
--output book.wav \
--voice bm_george \
--speed 0.95
# Shorter chunks for studying
kokoro-tts-tool infinite --input study.md \
--chunk-size 200 \
--pause 600
Output:
- Speaker mode: Real-time playback, seamless audio
- File mode: Fast offline rendering (20-50x real-time on M4)
list-voices - List Available Voices
Lists voice information including ID, name, gender, accent, quality grade, and description.
Usage:
kokoro-tts-tool list-voices [OPTIONS]
Options:
--language,-l VALUE: Filter by language (English, Japanese, etc.)--gender,-g VALUE: Filter by gender (Male, Female)--json: Output as JSON for scripting
Examples:
# List all voices
kokoro-tts-tool list-voices
# Filter by language
kokoro-tts-tool list-voices --language English
# Filter by gender
kokoro-tts-tool list-voices --gender Female
# Combined filters
kokoro-tts-tool list-voices --language English --gender Male
# JSON output for scripting
kokoro-tts-tool list-voices --json
Voice ID Format:
- Pattern:
[language][gender]_[name] - First letter: language (a=American, b=British, j=Japanese, etc.)
- Second letter: gender (f=Female, m=Male)
Quality Grades:
- A/A-: Highest quality (af_heart, af_bella, am_adam)
- B+/B: Good quality
- B-: Acceptable quality
info - Display Configuration
Shows information about the Kokoro TTS installation.
Usage:
kokoro-tts-tool info
Examples:
kokoro-tts-tool info
Output:
- Model status (Ready/Not downloaded)
- Model file locations
- Default settings
- Supported languages
completion - Shell Completion
Generate shell completion scripts for bash, zsh, or fish.
Usage:
kokoro-tts-tool completion SHELL
Arguments:
SHELL: Shell type (bash, zsh, fish)
Examples:
# Bash (add to ~/.bashrc)
eval "$(kokoro-tts-tool completion bash)"
# Zsh (add to ~/.zshrc)
eval "$(kokoro-tts-tool completion zsh)"
# Fish
kokoro-tts-tool completion fish > ~/.config/fish/completions/kokoro-tts-tool.fish
</details>
<details>
<summary><strong>⚙️ Advanced Features (Click to expand)</strong></summary>
Multi-Level Verbosity Logging
Control logging detail with progressive verbosity levels. All logs output to stderr.
Logging Levels:
| Flag | Level | Output | Use Case |
|---|---|---|---|
| (none) | WARNING | Errors and warnings only | Production, quiet mode |
-v | INFO | + High-level operations | Normal debugging |
-vv | DEBUG | + Detailed info, full tracebacks | Development |
-vvv | TRACE | + Library internals | Deep debugging |
Examples:
# INFO level
kokoro-tts-tool -v synthesize "Hello"
# DEBUG level
kokoro-tts-tool -vv infinite --input book.md
# TRACE level
kokoro-tts-tool -vvv synthesize "Hello"
Pipeline Composition
Compose commands with Unix pipes for workflows.
Examples:
# Get voice IDs as JSON and filter
kokoro-tts-tool list-voices --json | jq '.[].id'
# Read from another command
cat document.md | kokoro-tts-tool infinite --stdin
# Chain with file processing
find . -name "*.md" -exec cat {} \; | kokoro-tts-tool infinite --stdin
</details>
<details>
<summary><strong>🔧 Troubleshooting (Click to expand)</strong></summary>
Common Issues
Issue: Command not found
# Verify installation
kokoro-tts-tool --version
# Reinstall if needed
cd kokoro-tts-tool
uv tool install . --reinstall
Issue: Models not downloaded
# Initialize models
kokoro-tts-tool init
# Force re-download
kokoro-tts-tool init --force
Issue: Audio not playing
- Check system volume
- Try saving to file:
--output test.wav - Check with verbose:
-vv
Issue: Voice not found
# List available voices
kokoro-tts-tool list-voices
# Check voice ID format
kokoro-tts-tool list-voices --json | jq '.[].id'
Getting Help
# General help
kokoro-tts-tool --help
# Command-specific help
kokoro-tts-tool synthesize --help
kokoro-tts-tool infinite --help
</details>
Exit Codes
0: Success1: Error (validation, runtime, or unexpected)
Output Formats
Default Output:
- Human-readable formatted output
- Audio played through speakers
File Output (--output):
- WAV format (24kHz, mono, 16-bit)
JSON Output (--json on list-voices):
- Machine-readable voice data
- Perfect for pipelines and processing
Best Practices
- Initialize first: Run
kokoro-tts-tool initbefore synthesis - Use appropriate voices: Match voice to content (am_adam for audiobooks, bf_emma for education)
- Leverage infinite for documents: Better than synthesize for long content
- Use file output for production:
--outputfor consistent results - Check voice quality grades: A/A- voices recommended for production
Resources
- GitHub: https://github.com/dnvriend/kokoro-tts-tool
- Kokoro-82M Model: https://huggingface.co/hexgrad/Kokoro-82M
- kokoro-onnx: https://github.com/thewh1teagle/kokoro-onnx