Agent Skill
2/7/2026

document-generation

Runtime instructions for generating technical documentation from the COBOL Knowledge Graph. Use this skill whenever the user requests document generation, program reports, system overviews, or any output derived from Neo4j KG data via the DocumentGeneratorAgent.

V
vin082
0GitHub Stars
1Views
npx skills add vin082/LegacyCobolInsights

SKILL.md

Namedocument-generation
DescriptionRuntime instructions for generating technical documentation from the COBOL Knowledge Graph. Use this skill whenever the user requests document generation, program reports, system overviews, or any output derived from Neo4j KG data via the DocumentGeneratorAgent.

name: document-generation description: Runtime instructions for generating technical documentation from the COBOL Knowledge Graph. Use this skill whenever the user requests document generation, program reports, system overviews, or any output derived from Neo4j KG data via the DocumentGeneratorAgent.

Document Generation Skill

This skill guides document generation from the COBOL Knowledge Graph. It covers query strategies, graph schema, enrichment-aware logic, output structure, and export formats used by DocumentGeneratorAgent.


1. Knowledge Graph Schema Reference

All data is stored in Neo4j. Use these node labels, properties, and relationships when constructing Cypher queries for documentation.

Node Types

LabelKey PropertiesDescription
CobolProgramname, domain, description, complexity_score, loc, code, business_logic, file_pathMain COBOL program unit
Procedurename, type, descriptionParagraph or section within a program
DataFilename, descriptionExternal file accessed by programs
CopybooknameCOBOL copybook (shared data definitions)
Jobname, descriptionJCL job definition
TransactionnameCICS transaction
ScreenMapnameBMS screen definition
Variablename, type, levelData item declared in a program

Relationship Types

RelationshipFrom ’ ToMeaning
CALLSCobolProgram ’ CobolProgramProgram calls another program
READSCobolProgram ’ DataFileProgram reads a data file
WRITESCobolProgram ’ DataFileProgram writes to a data file
CONTAINSCobolProgram ’ ProcedureProgram contains a procedure/paragraph
INCLUDESCobolProgram ’ CopybookProgram includes a copybook
DECLARES_VARIABLECobolProgram ’ VariableProgram declares a variable
EXECUTESJob ’ CobolProgramJCL job executes a program
ALLOCATESJob ’ DataFileJCL job allocates a dataset
INVOKESTransaction ’ CobolProgramCICS transaction invokes a program

2. Document Types & When to Use Them

doc_typeUse WhenKey Data Sources
system_overviewUser wants a full catalog or summary of all/filtered programsAll CobolProgram nodes + aggregated stats
program_detailUser asks about a specific programSingle CobolProgram + its dependencies, data flows, procedures, business logic

3. Core Query Strategies

3.1 Gathering Program Data (System Overview)

Use this pattern to fetch programs with aggregated metrics in a single query. Apply WHERE filters for domain/complexity, and always include a LIMIT (default 10) for performance:

MATCH (p:CobolProgram)
// Optional filters: WHERE p.domain = 'Banking' AND p.complexity_score >= 30
OPTIONAL MATCH (p)-[:CALLS]->(called:CobolProgram)
OPTIONAL MATCH (caller:CobolProgram)-[:CALLS]->(p)
OPTIONAL MATCH (p)-[:CONTAINS]->(proc:Procedure)
OPTIONAL MATCH (p)-[r]->(f:DataFile)
WHERE type(r) IN ['READS', 'WRITES']

WITH p,
     COUNT(DISTINCT called) AS calls_out,
     COUNT(DISTINCT caller) AS calls_in,
     COUNT(DISTINCT proc) AS procedure_count,
     COUNT(DISTINCT CASE WHEN type(r) = 'READS' THEN f END) AS files_read,
     COUNT(DISTINCT CASE WHEN type(r) = 'WRITES' THEN f END) AS files_written

RETURN p.name AS program_name,
       COALESCE(p.domain, 'Not Enriched') AS domain,
       p.description AS description,
       COALESCE(p.complexity_score, 0) AS complexity,
       COALESCE(p.loc, 0) AS loc,
       calls_out, calls_in, procedure_count, files_read, files_written
ORDER BY COALESCE(p.complexity_score, calls_in + calls_out) DESC
LIMIT 10

3.2 Deep Dive into a Single Program

Run these queries together for a full program profile:

// Program metadata
MATCH (p:CobolProgram {name: 'PROGRAM_NAME'})
RETURN p.name, p.domain, p.description, p.complexity_score, p.loc, p.business_logic

// Outbound calls
MATCH (p:CobolProgram {name: 'PROGRAM_NAME'})-[:CALLS]->(called:CobolProgram)
RETURN called.name, called.domain, called.complexity_score

// Inbound calls
MATCH (caller:CobolProgram)-[:CALLS]->(p:CobolProgram {name: 'PROGRAM_NAME'})
RETURN caller.name, caller.domain

// Data file operations
MATCH (p:CobolProgram {name: 'PROGRAM_NAME'})-[r]->(f:DataFile)
WHERE type(r) IN ['READS', 'WRITES']
RETURN f.name, type(r) AS operation, f.description

// Internal procedures
MATCH (p:CobolProgram {name: 'PROGRAM_NAME'})-[:CONTAINS]->(proc:Procedure)
RETURN proc.name, proc.type, proc.description
ORDER BY proc.name LIMIT 100

// Copybooks included
MATCH (p:CobolProgram {name: 'PROGRAM_NAME'})-[:INCLUDES]->(c:Copybook)
RETURN c.name

// JCL jobs that execute this program
MATCH (j:Job)-[:EXECUTES]->(p:CobolProgram {name: 'PROGRAM_NAME'})
RETURN j.name, j.description

3.3 System-Level Statistics

// Node counts by label
MATCH (n)
WHERE n:CobolProgram OR n:Procedure OR n:DataFile OR n:Variable
RETURN labels(n)[0] AS label, count(*) AS count

// Relationship counts
MATCH ()-[r]->()
WHERE type(r) IN ['CALLS', 'READS', 'WRITES', 'CONTAINS_PROCEDURE', 'DECLARES_VARIABLE']
RETURN type(r) AS relationship, count(*) AS count

4. Enrichment-Aware Documentation Logic

Programs may or may not have been enriched by the EnrichmentAgent. Documentation quality depends on this:

Enrichment StatusAvailable DataDocumentation Approach
Enriched (domain is NOT null / Not Enriched)domain, description, business_logic, complexity_score, locUse all metadata directly in documentation
Not Enriched (domain is null or Not Enriched)name, loc, raw source code (p.code)Fall back to analyzing p.code (first ~2000 chars) to infer domain, complexity, and business logic

Always check enrichment status before generating per-program content. If not enriched and source code is available, include it in the LLM prompt so it can infer context.


5. LLM Prompt Guidelines for Documentation

When invoking the LLM to generate per-program documentation, structure the prompt as:

  1. Role: "You are a technical writer creating COBOL program documentation for developers and business analysts."
  2. Context block: Include program name, domain, description, business_logic (or source code snippet if not enriched), LOC, and complexity.
  3. Graph data block: List calls out, calls in, data files, procedures  as extracted from Neo4j.
  4. Section scaffolding: Ask for these sections in order:
    • Purpose and Overview (2-3 sentences)
    • Business Logic (rules and processes)
    • Key Functionality (bullet points)
    • Data Operations (files read/written and why)
    • Dependencies (upstream and downstream programs)
    • Technical Notes (complexity warnings, maintenance concerns)
  5. Format: Request markdown output. Keep it technical but clear.

LLM Configuration: Use temperature=0.3 for consistent, factual documentation output. The LLM is initialized via get_llm() from utils/llm_factory.py, which respects the provider set in settings (openai, groq, or google).


6. Complexity-Based Warnings

Apply these thresholds when generating documentation to flag risky programs:

Complexity ScoreRisk LevelAction in Document
< 30LowNo warning needed
3069MediumNote moderate complexity
>= 70HighAdd a warning section advising refactoring or testing

7. Output Structure

Markdown (doc_format: markdown)

Use this document skeleton for system_overview:

# COBOL System Technical Documentation
**Generated:** <timestamp>
**Total Programs Documented:** N
**Enriched Programs:** X / N

---
## 1. System Overview
- Purpose statement
- System statistics table (programs, data files, procedures)

## 2. Program Catalog
- Summary table: name, domain, complexity, LOC, calls out/in, files read/written

## 3. Detailed Program Specifications
- One subsection per program (### 3.1 PROGRAM_NAME)
- Generated by LLM (or fallback basic template if LLM fails)

For program_detail, output a single program's full specification (sections from the LLM prompt above).

DOCX Export

DOCX export converts the generated markdown to a Word document:

  • # ’ Title (centered)
  • ## ’ Heading 1
  • ### ’ Heading 2
  • #### ’ Heading 3
  • Markdown tables ’ Word tables (styled Light Grid Accent 1, header row bolded)
  • Bullet lists ’ Word bullet lists
  • Code blocks ’ Courier New, 9pt, indented
  • Blockquotes (>) ’ Indented, italic, gray text
  • Emojis are stripped from headings and table cells for clean DOCX output

8. Filtering Programs

The filters dict supports these keys when gathering programs:

Filter KeyValuesEffect
domainAny string (e.g. 'Banking', 'Claims')Filters by p.domain
complexity'low', 'medium', 'high'Maps to score thresholds: <30, 3069, >=70
max_programsInteger (default 10)Limits result count for performance

9. Fallback & Error Handling

  • If LLM invocation fails for a program, fall back to a basic template that populates sections purely from Neo4j data (no LLM needed).
  • If Neo4j queries fail, return empty lists/dicts  do not crash; log the error and continue with partial data.
  • Always include a generation timestamp and enrichment status warning in the final document.

10. Key Files

FileRole
agents/document_generator.pyMain DocumentGeneratorAgent  orchestrates data gathering, LLM generation, and export
utils/neo4j_client.pySingleton Neo4j client; use neo4j_client.query(cypher)
utils/llm_factory.pyMulti-provider LLM factory; use get_llm(temperature=...)
utils/state.pyDocumentGenerationState TypedDict for state shape
config/settings.pyAll config: LLM provider, Neo4j URI, filter defaults
agents/graph_builder.pyDefines what nodes/relationships get created (schema source of truth)
agents/enrichment.pyPopulates domain, business_logic, complexity_score on CobolProgram nodes
Skills Info
Original Name:document-generationAuthor:vin082