Agent Skill
2/7/2026document-skillspdf
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. Use when programmatically processing, generating, or analyzing PDF documents.
T
tankygranny05
0GitHub Stars
1Views
npx skills add tankygranny05/agent-box
SKILL.md
| Name | document-skillspdf |
| Description | Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. Use when programmatically processing, generating, or analyzing PDF documents. |
name: document-skills:pdf description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. Use when programmatically processing, generating, or analyzing PDF documents.
PDF Document Skills
Overview
This skill provides comprehensive PDF manipulation capabilities using Python libraries: PyPDF2 for basic operations, pdfplumber for text extraction, and reportlab for PDF creation.
When to Use
- Extracting text and tables from PDFs
- Creating new PDF documents
- Merging or splitting PDF files
- Filling PDF forms
- Adding watermarks or annotations
- Converting content to PDF format
Dependencies
pip install PyPDF2 pdfplumber reportlab pypdf
Quick Reference
Extract text from PDF
import pdfplumber
with pdfplumber.open('input.pdf') as pdf:
for page in pdf.pages:
text = page.extract_text()
print(text)
Extract tables from PDF
import pdfplumber
with pdfplumber.open('input.pdf') as pdf:
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
for row in table:
print(row)
Merge PDFs
from PyPDF2 import PdfMerger
merger = PdfMerger()
merger.append('file1.pdf')
merger.append('file2.pdf')
merger.append('file3.pdf')
merger.write('merged.pdf')
merger.close()
Split PDF
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader('input.pdf')
# Extract specific pages
writer = PdfWriter()
writer.add_page(reader.pages[0]) # First page
writer.add_page(reader.pages[2]) # Third page
with open('extracted.pdf', 'wb') as output:
writer.write(output)
Create PDF with reportlab
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
c = canvas.Canvas('output.pdf', pagesize=letter)
width, height = letter
# Add text
c.setFont('Helvetica-Bold', 24)
c.drawString(1*inch, height - 1*inch, 'Document Title')
c.setFont('Helvetica', 12)
c.drawString(1*inch, height - 2*inch, 'This is paragraph text.')
# Add line
c.line(1*inch, height - 2.5*inch, 7.5*inch, height - 2.5*inch)
c.save()
Create PDF with tables (reportlab)
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle
from reportlab.lib import colors
doc = SimpleDocTemplate('output.pdf', pagesize=letter)
elements = []
data = [
['Header 1', 'Header 2', 'Header 3'],
['Row 1 Col 1', 'Row 1 Col 2', 'Row 1 Col 3'],
['Row 2 Col 1', 'Row 2 Col 2', 'Row 2 Col 3'],
]
table = Table(data)
table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.grey),
('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('FONTSIZE', (0, 0), (-1, 0), 14),
('BOTTOMPADDING', (0, 0), (-1, 0), 12),
('BACKGROUND', (0, 1), (-1, -1), colors.beige),
('GRID', (0, 0), (-1, -1), 1, colors.black),
]))
elements.append(table)
doc.build(elements)
Add watermark
from PyPDF2 import PdfReader, PdfWriter
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from io import BytesIO
# Create watermark
packet = BytesIO()
c = canvas.Canvas(packet, pagesize=letter)
c.setFont('Helvetica', 60)
c.setFillColorRGB(0.5, 0.5, 0.5, 0.3)
c.rotate(45)
c.drawString(200, 100, 'WATERMARK')
c.save()
packet.seek(0)
watermark = PdfReader(packet)
reader = PdfReader('input.pdf')
writer = PdfWriter()
for page in reader.pages:
page.merge_page(watermark.pages[0])
writer.add_page(page)
with open('watermarked.pdf', 'wb') as output:
writer.write(output)
Get PDF metadata
from PyPDF2 import PdfReader
reader = PdfReader('input.pdf')
meta = reader.metadata
print(f'Title: {meta.title}')
print(f'Author: {meta.author}')
print(f'Pages: {len(reader.pages)}')
Rotate pages
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader('input.pdf')
writer = PdfWriter()
for page in reader.pages:
page.rotate(90) # Rotate 90 degrees clockwise
writer.add_page(page)
with open('rotated.pdf', 'wb') as output:
writer.write(output)
Encrypt PDF
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader('input.pdf')
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
writer.encrypt('user_password', 'owner_password')
with open('encrypted.pdf', 'wb') as output:
writer.write(output)
Tips
- Use pdfplumber for accurate text extraction (better than PyPDF2)
- Use reportlab for creating complex PDFs with precise control
- PyPDF2 is best for merging, splitting, and basic manipulation
- For forms, consider pdfrw or PyMuPDF (fitz)
- OCR requires additional tools like pytesseract for scanned PDFs
Skills Info
Original Name:document-skillspdfAuthor:tankygranny05
Download