Agent Skill
2/7/2026

document-conversion

Convert DOC/DOCX/PDF/PPT/PPTX documents to Markdown format. Automatically detect PDF type (electronic/scanned), extract images to separate directory. Use this Skill when administrator onboards non-Markdown documents. Trigger condition: Onboard DOC/DOCX/PDF/PPT/PPTX format files.

H
harryoung
91GitHub Stars
1Views
npx skills add Harryoung/efka

SKILL.md

Namedocument-conversion
DescriptionConvert DOC/DOCX/PDF/PPT/PPTX documents to Markdown format. Automatically detect PDF type (electronic/scanned), extract images to separate directory. Use this Skill when administrator onboards non-Markdown documents. Trigger condition: Onboard DOC/DOCX/PDF/PPT/PPTX format files.

name: document-conversion description: Convert DOC/DOCX/PDF/PPT/PPTX documents to Markdown format. Automatically detect PDF type (electronic/scanned), extract images to separate directory. Use this Skill when administrator onboards non-Markdown documents. Trigger condition: Onboard DOC/DOCX/PDF/PPT/PPTX format files.

Document Format Conversion

Convert various document formats to Markdown for knowledge base onboarding.

Supported Formats

FormatProcessing Method
DOCXPandoc conversion, preserve formatting and images
DOCLibreOffice → DOCX → Pandoc
PDF ElectronicPyMuPDF4LLM fast conversion
PDF ScannedPaddleOCR-VL online OCR
PPTXpptx2md professional conversion
PPTLibreOffice → PPTX → pptx2md

Usage

python .claude/skills/document-conversion/scripts/smart_convert.py \
    <temp_path> \
    --original-name "<original_filename>" \
    --json-output

Parameters:

  • <temp_path>: Temporary file path (e.g. /tmp/kb_upload_xxx.pptx)
  • --original-name: Must pass original filename, used to generate correct image directory name
  • --json-output: Output JSON format result

Output Format

{
  "success": true,
  "markdown_file": "/path/to/output.md",
  "images_dir": "original_filename_images",
  "image_count": 5,
  "input_file": "/path/to/input.pptx"
}

Processing Flow

  1. Execute conversion command (must use --original-name and --json-output)
  2. Parse JSON output, check success field
  3. If success: false, report error and end
  4. If success: true, record generated file path and image directory

Important Notes

  • Image directory uses original filename naming (e.g. 培训资料_images/)
  • Not passing --original-name will cause incorrect image reference paths
  • PDF type is automatically detected, scanned version processing is slower (tens of seconds to minutes)

Format Details

Detailed processing instructions for each format, see FORMATS.md

Skills Info
Original Name:document-conversionAuthor:harryoung