tools
Repository-wide tooling including init wrapper, pack/publish utilities, and all helper scripts. Links to tools-special and tools-templates for subfolder details.
SKILL.md
| Name | tools |
| Description | Repository-wide tooling including init wrapper, pack/publish utilities, and all helper scripts. Links to tools-special and tools-templates for subfolder details. |
name: tools description: Repository-wide tooling including init wrapper, pack/publish utilities, and all helper scripts. Links to tools-special and tools-templates for subfolder details.
Repository Tools Overview
Continuous improvement: see
continuous_improvement.mdin this folder for notes on tool behaviour, past feedback, and update procedures.
Use this skill when working with repository-wide tools, understanding tool architecture, or coordinating workflows across multiple tool categories.
Tool categories
The tools/ directory contains all helper scripts and utilities:
Core scripts (tools/)
-
init.py: Async wrapper for pytextgen with mtime/inode caching- Discovers changed
.mdfiles (excludes.git,.obsidian,tools) - Caches
(mtime, inode, text)to skip unchanged files - Normalizes newlines to
\nbefore pytextgen processing - Passes through pytextgen flags (
-C,--no-code-cache,--init-flashcards) - Commands:
uv run -m init generate,uv run -m init clear
- Discovers changed
-
convert wiki.py: Wikipedia HTML → Markdown converter- Reads HTML from clipboard
- Normalizes links (relative paths with
%20encoding) - Downloads media to
archives/Wikimedia Commons/ - Uses
convert wiki.py.names map.jsonfor filename renames - Preserves Wikipedia attribution
- Command:
uv run -m "convert wiki"
-
pack.py: PageRank-ordered zip bundling- Walks Markdown links to build dependency graph
- Computes PageRank to prioritize important files
- Creates zip bundle with metadata (ranks, omissions, link closure)
- Configurable: damping factor, iterations, max files
- Command:
python -m pack -o pack.zip -n 25 --damping-factor 0.5 --page-rank-iterations 100 <paths>
-
publish.py: Private → public history mirroring via git filter-repo- Clones
private/.gittemporarily - Runs
git filter-repowith propertyPrivate-commitfiltering - Rewrites commit history to remove sensitive paths
- Rebases with signing, adds remote to public
.git - Command:
python -m publish --paths-file <file>(withliteral:<path>lines)
- Clones
Subfolder tools
-
tools/special/: Academic LMS converters and course management (seetools-specialskill)- Canvas/HKUST Zinc converters
- Course catalog fetchers
-
tools/templates/: Note scaffolding and pytextgen templates (seetools-templatesskill)new wiki page.pypytextgen generate *.mdtemplates
Submodule tools
tools/pytextgen/: Git submodule for content generation library (seepytextgenskill)tools/pyarchivist/: Git submodule for archiving tool (seepyarchivistskill)
When to use this skill
- Understanding tool relationships and dependencies
- Coordinating multi-tool workflows (e.g., wiki ingestion → generation → packaging)
- Troubleshooting tool interactions
- Planning new tool development or refactoring
Common workflows
End-to-end wiki ingestion
- Scaffold note:
uv run -m "templates.new wiki page"(tools-templates) - Ingest HTML:
uv run -m "convert wiki"(convert wiki.py) - Flashcard generation is automatic; do not run
uv run -m init generate. Build workflows will handle it.
Academic course organization
- Convert LMS export:
python -m tools.special."convert Canvas submission"(tools-special) - Update index: Edit
special/academia/<Institution>/index.md - Add pytextgen fences; regeneration is handled by the build system and should not be invoked manually.
Packaging and publishing
- Regeneration of generated content is automatic and occurs as part of the
build; manual invocation (
uv run -m init generate) is not required. - Package bundle:
uv run -m pack -o bundle.zip -n 50 <paths>(pack.py) - Publish filtered history:
uv run -m publish --paths-file paths.txt(publish.py)
Archive management
- Archive content: Use pyarchivist tool (pyarchivist skill)
- Verify index: Check
archives/*/index.mdupdated - Reference in notes: Add relative links with
%20encoding
Tool dependencies
Python requirements
See requirements.txt for package dependencies:
anyio: Async I/O for init.pyaiohttp: HTTP client for downloadsbs4(BeautifulSoup): HTML parsing for convert wiki.pyPyYAML: YAML frontmatter parsingpytextgen: Content generation library- Others as needed
Install: pip install -r requirements.txt
External tools
- git: Required for all workflows
- git-filter-repo: Required for
publish.py - Python 3.8+: Minimum version for async features
Git submodules
tools/pytextgen/: Content generation enginetools/pyarchivist/: Archiving tool
Agent‑internal scripts policy
The repository occasionally contains small helper scripts used internally by the
AI agent (for example, .github/scripts/validate-skills.py). These are
not meant to be installed as CI tools, exposed to users, or added to
package.json/requirements.txt.
- Avoid creating similar "agent‑only" utilities without explicit approval from the repository owner.
- If an automated validation step or helper script is genuinely needed by the
project, propose a formal PR and discuss with the owner before adding it to
CI or packaging; the default assumption is that such tools belong in the
.github/scripts/directory and are not executed in production workflows. self/**: Personal metadata submodulesprivate/**: Private content submodule
Tool architecture
init.py caching strategy
- On first run: Walk workspace, compute
(mtime, inode)for all.mdfiles - Cache text and metadata in memory
- On subsequent runs: Skip files with unchanged
(mtime, inode) - Normalize
\nbefore passing to pytextgen - Use
-C/--no-cachedto rebuild cache
Cache location: In-memory (not persisted to disk)
pytextgen compile cache
- Compiled Python modules cached in
__pycache__/ - Use
--no-code-cacheto bypass - Clear with
rm -rf __pycache__/if stale
Git filter-repo workflow
- Clone
private/.gitto temporary directory - Run
git filter-repo --path-renamebased on--paths-file - Filter commits by
Private-commitproperty - Rebase and sign commits
- Add temporary remote to public
.git - User must manually push
CLI stability
Critical: Core tools have established interfaces; preserve:
- Argument names and order
- Expected input formats (clipboard, files, stdin)
- Output formats (Markdown, YAML, CSV, ZIP)
- Error codes and messages
If changes are needed, ask user for permission first.
Best practices
Tool coordination
- Regenerate before packaging: Generated content is normally kept fresh by the build process; manual
uv run -m init generate -Cis rarely needed and agents should not perform it. - Clean before publishing: Verify
private/content is properly filtered beforepublish.py - Archive before ingestion: Use pyarchivist for media before manual note creation
- Template before conversion: Scaffold frontmatter before ingesting content
Error handling
- Check tool exit codes before proceeding
- Verify file existence before processing
- Validate YAML/HTML/CSV formats before parsing
- Use
--dry-runor preview modes when available
Performance
- Use init.py caching to skip unchanged files
- Parallelize independent operations (e.g., multiple
convert wikiruns) - Limit PageRank iterations in
pack.pyfor large graphs - Use
--exclude-extensionin pack.py to skip large assets
When to ask for help
- If tool behavior is unexpected, consult user or check tool documentation
- If editing submodules (
pytextgen,pyarchivist), ask user for permission - If new tool is needed, discuss requirements and architecture with user
- If tool fails mysteriously, check Python version and dependencies
Common issues
- Cache staleness: Use
-C/--no-cachedif init.py skips changed files - Module import errors: Ensure
requirements.txtpackages installed - Git submodule out of date: Run
git submodule update --remote - Path encoding issues: Ensure
%20encoding for spaces in links - Clipboard access: Some tools require clipboard support (may fail in headless environments)
Editing guidelines for tools/**/*.py
When editing Python helper scripts in tools/:
- Preserve CLI surfaces: Keep argument names, defaults, and help text stable; avoid breaking
uv run -m init generate/clear,uv run -m pack,uv run -m publish, and other tool entrypoints - Maintain async/anyio patterns: Preserve async patterns using AnyIO APIs such as
anyio.Pathandanyio.Semaphore, but prefer importing helpers from Asyncer (e.g.from asyncer import create_task_group,soonify,asyncify) for better typing and editor support. The init wrapper has been refactored accordingly and includes its own thin_gather. - Keep submodules read-only unless requested:
tools/pytextgen,tools/pyarchivist,self/**,private/**are git submodules; ask user before editing - Normalize newlines: When touching the init wrapper, normalize to
\n; do not bypass its exclusion list (.git,.obsidian,tools) - Favor relative imports: Use relative imports within the tools package; do not hardcode absolute host paths
- Update requirements.txt: When adding dependencies, update
requirements.txtand note any non-pip prerequisites; prefer lightweight stdlib/typed solutions - Test thoroughly: Python tools are critical infrastructure; test changes carefully before committing