Agent Skill
2/7/2026

gemini-video-understanding

Analyze and understand videos using Google's Gemini API. Use when the user asks to analyze, understand, describe, summarize, transcribe, or extract information from videos. Supports local video files (MP4, MOV, WebM, etc.) and YouTube URLs. Can answer questions about video content, describe scenes, identify objects/people/actions, extract text/timestamps, and more. Use this skill when user provides a video file path or YouTube link and wants to understand its content.

F
ferdousbhai
0GitHub Stars
1Views
npx skills add ferdousbhai/gemini-video-understanding

SKILL.md

Namegemini-video-understanding
DescriptionAnalyze and understand videos using Google's Gemini API. Use when the user asks to analyze, understand, describe, summarize, transcribe, or extract information from videos. Supports local video files (MP4, MOV, WebM, etc.) and YouTube URLs. Can answer questions about video content, describe scenes, identify objects/people/actions, extract text/timestamps, and more. Use this skill when user provides a video file path or YouTube link and wants to understand its content.

name: gemini-video-understanding description: Analyze and understand videos using Google's Gemini API. Use when the user asks to analyze, understand, describe, summarize, transcribe, or extract information from videos. Supports local video files (MP4, MOV, WebM, etc.) and YouTube URLs. Can answer questions about video content, describe scenes, identify objects/people/actions, extract text/timestamps, and more. Use this skill when user provides a video file path or YouTube link and wants to understand its content.

Gemini Video Understanding

uv run ~/.claude/skills/gemini-video-understanding/scripts/understand_video.py \
  --source "path/to/video.mp4" \
  --prompt "Describe what happens in this video" \
  [--fast] \
  [--fps 1] \
  [--start 0] \
  [--end 60]

Source Types

  • Local files: --source "video.mp4" — MP4, MOV, WebM, AVI, MKV, etc.
  • YouTube: --source "https://youtube.com/watch?v=..." — Any YouTube URL

Options

OptionDescription
--fastUse faster flash model instead of pro
--fpsFrame rate sampling (default: 1 fps). Use 2+ for fast action
--startStart time in seconds for clipping
--endEnd time in seconds for clipping

Example Prompts

  • "Summarize this video in 3 bullet points"
  • "What text appears on screen? Include timestamps"
  • "Describe the main actions and events with timestamps"
  • "Identify all people and what they're doing"

API Key

Uses GEMINI_API_KEY env var, or pass --api-key KEY.

Updating Models

Edit ~/.claude/skills/gemini-video-understanding/config.json:

{
  "default_model": "gemini-3-pro-preview",
  "fast_model": "gemini-3-flash-preview"
}
Skills Info
Original Name:gemini-video-understandingAuthor:ferdousbhai