Agent Skill
2/7/2026

autocut-shorts

Main orchestration skill for automatic creation of short-form content (TikTok, YouTube Shorts, Instagram Reels) from long videos. Fully automated workflow: download video, transcribe, detect highlights (transcript + laughter + sentiment + scenes), trim segments, resize to 9:16 portrait, and add subtitles. Finds viral-worthy moments like OpusClip and Vizard.ai.

A
akrindev
0GitHub Stars
1Views
npx skills add akrindev/trimer-clip

SKILL.md

Nameautocut-shorts
DescriptionMain orchestration skill for automatic creation of short-form content (TikTok, YouTube Shorts, Instagram Reels) from long videos. Fully automated workflow: download video, transcribe, detect highlights (transcript + laughter + sentiment + scenes), trim segments, resize to 9:16 portrait, and add subtitles. Finds viral-worthy moments like OpusClip and Vizard.ai.

name: autocut-shorts description: Main orchestration skill for automatic creation of short-form content (TikTok, YouTube Shorts, Instagram Reels) from long videos. Fully automated workflow: download video, transcribe, detect highlights (transcript + laughter + sentiment + scenes), trim segments, resize to 9:16 portrait, and add subtitles. Finds viral-worthy moments like OpusClip and Vizard.ai. allowed-tools: Bash(ffmpeg:) Bash(yt-dlp:) Bash(python:*) compatibility: Requires all trimer-clip dependencies and FFmpeg metadata: version: "1.0" platforms: "TikTok, YouTube Shorts, Instagram Reels, Facebook Reels"

Autocut Shorts

This is the main orchestration skill that combines all other skills to automatically create short-form content from long videos.

What It Does

This skill automates the entire workflow:

  1. Download video from YouTube URL (if provided)
  2. Transcribe audio using Whisper (local), OpenAI Whisper API, Google Speech-to-Text, or Gemini API
  3. Perform speaker diarization (pyannote or Gemini) - identifies who speaks when
  4. Detect highlights using combined analysis:
    • Transcript analysis (hooks, viral phrases)
    • Speaker dynamics (debates, interactions, overlapping speech)
    • Laughter detection (humorous moments)
    • Sentiment analysis (emotional peaks)
    • Scene detection (cut points)
  5. Select best segments (15-60 seconds each)
  6. Trim video to highlight segments
  7. Resize to 9:16 portrait format (1080x1920)
  8. Add burned-in subtitles (segment captions or word-level karaoke when word timestamps are available)
  9. Export multiple clips ready for upload

When to Use

  • User wants to create TikTok clips from a YouTube video
  • Converting podcasts to short-form content
  • Finding viral moments in vlogs or tutorials
  • Repurposing gaming content for Shorts/Reels
  • Batch processing multiple videos

Available Scripts

scripts/autocut.py

Main autocut workflow script.

Usage:

python skills/autocut-shorts/scripts/autocut.py <video_or_url> [options]

Options:

  • --source: Source type (file, youtube) - auto-detected
  • --num-clips: Number of clips to generate (default: 5)
  • --min-duration: Minimum clip duration in seconds (default: 15)
  • --max-duration: Maximum clip duration in seconds (default: 60)
  • --platform: Target platform (tiktok, shorts, reels, facebook) - default: tiktok
  • --output-dir: Output directory (default: ./shorts/)
  • --transcription-model: Transcription model (auto, whisper, gemini, openai, google) - default: auto
  • --whisper-model: Whisper model size (tiny, base, small, medium, large-v3) - default: large-v3
  • --openai-model: OpenAI Whisper model (default: whisper-1)
  • --google-model: Google Speech model (default: latest_long)
  • --diarization-model: Speaker diarization (auto, pyannote, gemini, none) - default: auto
  • --huggingface-token: HuggingFace token for pyannote (or use env var)
  • --focus-speaker: Extract clips only for specific speaker (SPEAKER_00, etc.)
  • --gemini-api-key: Gemini API key (or use env var)
  • --skip-transcribe: Skip transcription if already have transcript
  • --skip-diarization: Skip speaker diarization
  • --skip-scenes: Skip scene detection
  • --skip-laughter: Skip laughter detection
  • --skip-sentiment: Skip sentiment analysis
  • --transcript-path: Use existing transcript file (SRT/VTT/JSON)
  • --word-timestamps-path: Provide word-timestamp JSON for karaoke subtitles
  • --subtitle-mode: Subtitle mode (auto, word, segment) - default: auto
  • --style: Subtitle style (tiktok, shorts, reels, default) - default: tiktok

Examples:

Basic autocut from file:

python skills/autocut-shorts/scripts/autocut.py video.mp4

Autocut from YouTube URL:

python skills/autocut-shorts/scripts/autocut.py "https://www.youtube.com/watch?v=VIDEO_ID"

Generate 10 clips for Instagram Reels:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --num-clips 10 --platform reels --style reels

Use Gemini for transcription:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model gemini

Quick local test with Whisper tiny:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model whisper --whisper-model tiny

Use OpenAI Whisper API for word-level captions:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model openai --subtitle-mode word

Use Google Speech-to-Text for word-level captions:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model google --subtitle-mode word

Custom duration range:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --min-duration 20 --max-duration 45

Use existing transcript:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcript-path video.srt --skip-transcribe

Use word timestamps JSON directly:

python skills/autocut-shorts/scripts/autocut.py video.mp4 --word-timestamps-path words.json --subtitle-mode word

scripts/quick_cut.py

Quick cut without full analysis (faster).

Usage:

python skills/autocut-shorts/scripts/quick_cut.py <video_path> [options]

Options:

  • --timestamps: JSON file with timestamps to cut
  • --output-dir: Output directory
  • --platform: Target platform

Example:

python skills/autocut-shorts/scripts/quick_cut.py video.mp4 --timestamps cuts.json

Workflow Steps

Step 1: Download (Optional)

If URL provided:

  • Downloads from YouTube using yt-dlp
  • Best quality MP4
  • Saves to temp directory

Step 2: Transcribe

Extracts audio and transcribes:

  • Auto mode: Chooses between whisper or gemini based on requirements
  • Whisper: Local processing, good for privacy
  • OpenAI Whisper API: Cloud processing with word-level timestamps
  • Google STT: Cloud processing with word-level timestamps and diarization
  • Gemini: Cloud processing, better quality + features

Step 3: Detect Highlights

Runs detection modules:

  • Transcript analysis: Viral phrases, hooks, questions
  • Laughter detection: Funny moments (if enabled)
  • Sentiment analysis: Emotional peaks (if enabled)
  • Scene detection: Visual cut points (if enabled)

Step 4: Score and Rank

Combines all signals:

Virality Score = 
  35% Transcript (hooks, viral content) +
  25% Laughter (humor) +
  25% Sentiment (emotion) +
  15% Scenes (visual transitions)

Ranks all segments and selects top N.

Step 5: Trim

For each highlight:

  • Extends 2-3 seconds before/after for context
  • Trims using FFmpeg (stream copy for speed)
  • Validates duration constraints

Step 6: Resize to Portrait

Converts to 9:16:

  • Smart crop (focus on subjects)
  • 1080x1920 resolution
  • Maintains quality

Step 7: Add Subtitles

Burns in captions:

  • Platform-specific styling
  • Segment captions (SRT/VTT) or word-level karaoke (ASS) when word timestamps are available
  • Safe area padding for TikTok/Shorts UI

Step 8: Export

Saves final clips:

  • Named: {original}_short_{index}.mp4
  • Organized in output directory
  • JSON report with metadata

Output Format

Directory Structure

shorts/
  <video_slug>_<YYYYMMDD-HHMMSS>/
    clip_001/
      master.mp4
      data.json
    clip_002/
      master.mp4
      data.json

JSON Report

{
  "success": true,
  "source": {
    "type": "youtube",
    "url": "https://youtube.com/watch?v=...",
    "title": "Video Title",
    "duration": 1200.5
  },
  "processing": {
    "transcription_model": "gemini-flash-lite-latest",
    "detection_methods": ["transcript", "laughter", "sentiment", "scenes"],
    "platform": "tiktok"
  },
  "results": {
    "total_clips": 5,
    "clips": [
      {
        "rank": 1,
        "filename": "video_short_001.mp4",
        "start_time": 45.2,
        "end_time": 72.5,
        "duration": 27.3,
        "virality_score": 0.92,
        "text": "This is the key moment...",
        "output_path": "shorts/video_short_001.mp4"
      }
    ],
    "total_duration": 135.5,
    "avg_virality_score": 0.78
  },
  "performance": {
    "total_time": 180.5,
    "transcription_time": 45.2,
    "analysis_time": 67.3,
    "processing_time": 68.0
  }
}

Platform Presets

TikTok

  • Resolution: 1080x1920
  • Duration: 15-60 seconds
  • Subtitle style: TikTok
  • Output naming: _tiktok_{index}.mp4

YouTube Shorts

  • Resolution: 1080x1920
  • Duration: 15-60 seconds
  • Subtitle style: Shorts
  • Output naming: _shorts_{index}.mp4

Instagram Reels

  • Resolution: 1080x1920
  • Duration: 15-90 seconds
  • Subtitle style: Reels
  • Output naming: _reels_{index}.mp4

Facebook Reels

  • Resolution: 1080x1920
  • Duration: 15-90 seconds
  • Subtitle style: Default
  • Output naming: _facebook_{index}.mp4

Viral Detection Algorithm

High-Value Signals

Transcript (35% weight):

  • Viral phrases ("you won't believe", "this changes everything")
  • Hooks ("let me tell you", "here's the secret")
  • Questions and answers
  • Story beats

Laughter (25% weight):

  • Explicit laughter markers
  • High-confidence laughter detection
  • Audience reactions

Sentiment (25% weight):

  • Positive emotions (excitement, joy)
  • Surprise moments
  • Negative emotions (controversy, drama)
  • Emotional intensity > 0.7

Scenes (15% weight):

  • Scene transitions
  • Visual changes
  • Topic shifts

Scoring

virality_score = (
    transcript_score * 0.35 +
    laughter_score * 0.25 +
    sentiment_score * 0.25 +
    scene_score * 0.15
)

Premium Clips (0.8-1.0): Must include Excellent Clips (0.6-0.8): High priority Good Clips (0.4-0.6): Consider including

AI Agent Decision Guide

When to Use pyannote vs Gemini Diarization

Default Behavior (--diarization-model auto): The AI agent automatically selects based on context:

# Use pyannote when:
if "podcast" in user_request or "interview" in user_request:
    return "pyannote"  # Multi-speaker, needs accuracy
if "accurate" in user_request or "precise" in user_request:
    return "pyannote"  # User explicitly wants accuracy
if "panel" in user_request or "debate" in user_request:
    return "pyannote"  # Complex multi-speaker scenarios
if "overlapping" in user_request or "talk over" in user_request:
    return "pyannote"  # Overlapping speech detection
if "privacy" in user_request or "offline" in user_request:
    return "pyannote"  # Local processing needed

# Use Gemini when:
if "quick" in user_request or "fast" in user_request:
    return "gemini"  # Speed priority
if "single speaker" in user_request or "monologue" in user_request:
    return "gemini"  # Simple scenario
if "no diarization" in user_request or "skip speakers" in user_request:
    return "none"  # User doesn't want speaker detection

# Default for ambiguous cases:
return "pyannote" if likely_multi_speaker(video) else "gemini"

Decision Matrix:

ScenarioRecommendedReason
Podcast with 2-3 hostspyannoteHigh accuracy for multi-speaker
Interview (host + guest)pyannotePrecise speaker separation
Panel discussionpyannoteHandles 4+ speakers well
Single speaker vloggeminiFaster, good enough
Gaming commentarygeminiUsually 1-2 speakers
Tutorial videogeminiSingle speaker, speed matters
Debate/competitivepyannoteOverlapping speech detection
Privacy-sensitivepyannoteLocal processing

Examples by Use Case:

# Podcast - use pyannote automatically
python skills/autocut-shorts/scripts/autocut.py podcast.mp4

# Interview - use pyannote for accuracy
python skills/autocut-shorts/scripts/autocut.py interview.mp4

# Vlog - use gemini (single speaker, faster)
python skills/autocut-shorts/scripts/autocut.py vlog.mp4

# Force pyannote explicitly
python skills/autocut-shorts/scripts/autocut.py video.mp4 --diarization-model pyannote

# Skip diarization for simple content
python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --diarization-model none

# Extract only host's segments
python skills/autocut-shorts/scripts/autocut.py podcast.mp4 --focus-speaker SPEAKER_00

Smart Defaults

The agent automatically detects:

  1. Content type (podcast, vlog, tutorial, gaming, etc.)
  2. Likely speaker count based on audio patterns
  3. User priority (speed vs accuracy vs privacy)
  4. Available resources (GPU, internet, API keys)

Override any time: Users can always override with --diarization-model flag.

Integration

This skill uses all other skills:

  • youtube-downloader: Download from URL
  • video-transcriber: Transcribe audio
  • scene-detector: Find visual cut points
  • laughter-detector: Find funny moments
  • sentiment-analyzer: Find emotional peaks
  • highlight-scanner: Combine all signals
  • video-trimmer: Cut segments
  • portrait-resizer: Convert to 9:16
  • subtitle-overlay: Add captions

Common Use Cases

Podcast to Shorts

python skills/autocut-shorts/scripts/autocut.py podcast.mp4 --num-clips 10 --platform shorts

Vlog Highlights

python skills/autocut-shorts/scripts/autocut.py vlog.mp4 --num-clips 5 --platform tiktok

YouTube to TikTok

python skills/autocut-shorts/scripts/autocut.py "https://youtube.com/watch?v=..." --platform tiktok

Tutorial Clips

python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --min-duration 30 --max-duration 60

Performance

Processing Time (approximate):

  • 1-minute video: ~30-60 seconds
  • 10-minute video: ~3-5 minutes
  • 30-minute video: ~8-12 minutes
  • 1-hour video: ~15-25 minutes

Breakdown:

  • Download: 5-30 seconds (depends on video)
  • Transcription: 20-60 seconds
  • Detection: 10-30 seconds per method
  • Trimming: 1-5 seconds per clip
  • Resizing: 5-10 seconds per clip
  • Subtitles: 5-10 seconds per clip

Error Handling

  • Download failure: Retries up to 3 times
  • Transcription failure: Falls back to alternative model
  • No highlights found: Returns error with suggestions
  • Processing failure: Reports which step failed
  • Partial success: Reports successful clips vs failed

Tips

  • Use Gemini transcription for best highlight detection
  • Provide more clips requested than needed (filter by score)
  • 15-30 second clips perform best on TikTok
  • 30-60 second clips work well for Shorts/Reels
  • Keep 2-3 second buffer around highlights
  • Test different platforms for best engagement
  • Use transcript-only mode for faster processing
  • Batch process multiple videos for efficiency

References

Skills Info
Original Name:autocut-shortsAuthor:akrindev