index
Index codebases for semantic code search. Use this skill to make a project's source code searchable via vector similarity, enabling natural language queries like "how does authentication work" or "find the database connection logic".
SKILL.md
| Name | index |
| Description | Index codebases for semantic code search. Use this skill to make a project's source code searchable via vector similarity, enabling natural language queries like "how does authentication work" or "find the database connection logic". |
name: index description: Index codebases for semantic code search. Use this skill to make a project's source code searchable via vector similarity, enabling natural language queries like "how does authentication work" or "find the database connection logic".
Index Skill
Index codebases for semantic code search via BrAIny API. Enables natural language queries over source code using vector embeddings.
Quick Start
# Index a codebase
bun scripts/index.ts index /path/to/project --name "my-project"
# Search indexed code
bun scripts/index.ts search "authentication middleware" --limit 10
# Reindex after changes
bun scripts/index.ts reindex
# List indexed codebases
bun scripts/index.ts list
Environment Variables
| Variable | Purpose | Default |
|---|---|---|
BRAINY_URL | API base URL | http://localhost:3090 |
BRAINY_PROJECT_ID | Project ID for organization | Required |
API Reference
Index a Codebase
POST /v1/projects/{projectId}/codebases
Content-Type: application/json
{
"name": "brainy",
"path": "/Users/jp/Projects/brainy"
}
Response
{
"success": true,
"codebaseId": "uuid",
"stats": {
"totalFiles": 52,
"newFiles": 52,
"modifiedFiles": 0,
"deletedFiles": 0,
"totalChunks": 215,
"indexedChunks": 215
}
}
Reindex a Codebase
Performs incremental update - only processes new/modified files:
POST /v1/codebases/{codebaseId}/reindex
Response
{
"success": true,
"codebaseId": "uuid",
"stats": {
"totalFiles": 54,
"newFiles": 2,
"modifiedFiles": 3,
"deletedFiles": 0,
"totalChunks": 220,
"indexedChunks": 220
}
}
Search Code
GET /v1/codebases/{codebaseId}/search?query={text}&limit={n}
| Parameter | Description |
|---|---|
query | Natural language search text |
limit | Max results (default 10, max 50) |
Response
{
"success": true,
"chunks": [
{
"id": "uuid",
"filePath": "src/services/embedding.ts",
"startLine": 1,
"endLine": 56,
"content": "import OpenAI from 'openai'...",
"tokenCount": 428,
"similarity": 0.42,
"language": "typescript"
}
],
"totalCount": 20
}
List Codebases
GET /v1/projects/{projectId}/codebases
Get Codebase Details
GET /v1/codebases/{codebaseId}
Delete Codebase
Soft deletes the codebase and all its chunks:
DELETE /v1/codebases/{codebaseId}
How Indexing Works
1. Crawling
The crawler walks the directory tree and discovers source files:
Automatically Ignored
.git,.svn,.hg(version control)node_modules,vendor(dependencies)dist,build,target,out(build outputs).idea,.vscode(IDE configs)*.min.js,*.min.css(minified files)- Binary files (images, PDFs, archives, executables)
- Files > 5MB
Language Detection Detected by extension: TypeScript, JavaScript, Python, Go, Rust, Java, Ruby, PHP, C/C++, SQL, HTML, CSS, JSON, YAML, Markdown, and more.
2. Chunking
Files are split into chunks optimized for embeddings:
| Parameter | Value | Purpose |
|---|---|---|
| Target tokens | ~500 | Optimal semantic density |
| Max tokens | 2000 | Safe buffer below model limit |
| Min tokens | 50 | Avoid tiny fragments |
Semantic Boundaries Chunks break at natural points:
- Empty lines
- Function/class/interface definitions
- Export statements
- Comment blocks (documentation)
3. Embedding
Each chunk is embedded using OpenAI's text-embedding-3-small model (1536 dimensions). Embeddings enable semantic similarity search.
4. Storage
Chunks are stored in PostgreSQL with pgvector:
code_chunks (
id UUID PRIMARY KEY,
codebase_id UUID NOT NULL,
file_id UUID,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
start_line INTEGER,
end_line INTEGER,
token_count INTEGER NOT NULL,
embedding vector(1536),
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ
)
Indexes
- HNSW on
embeddingfor fast vector similarity (cosine distance) - B-tree on
codebase_id,file_idfor filtering
Incremental Updates
When reindexing, the system:
- Compares file hashes to detect changes
- Skips unchanged files
- Re-chunks and re-embeds modified files
- Removes chunks from deleted files
- Adds chunks for new files
This makes reindexing fast after small changes.
Search Tips
Natural Language Queries
- "How does authentication work"
- "Database connection pooling"
- "Error handling in API routes"
- "Token generation for embeddings"
Specific Code Patterns
- "PostgreSQL query with vector search"
- "Hono middleware for validation"
- "TypeScript interface for memory"
Finding Implementations
- "EmbeddingService class"
- "chunkByTokens function"
- "Memory repository insert"
Database Schema
codebases
codebases (
id UUID PRIMARY KEY,
project_id UUID NOT NULL,
name TEXT NOT NULL,
path TEXT NOT NULL,
total_files INTEGER DEFAULT 0,
total_chunks INTEGER DEFAULT 0,
last_indexed_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ
)
codebase_files
codebase_files (
id UUID PRIMARY KEY,
codebase_id UUID NOT NULL,
file_path TEXT NOT NULL,
file_hash TEXT NOT NULL,
language TEXT,
chunk_count INTEGER DEFAULT 0,
last_indexed_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ
)
code_chunks
code_chunks (
id UUID PRIMARY KEY,
codebase_id UUID NOT NULL,
file_id UUID,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
start_line INTEGER,
end_line INTEGER,
token_count INTEGER NOT NULL,
embedding vector(1536),
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ
)
Usage Guidelines
- Index codebases at the start of a project for code-aware assistance
- Reindex after significant changes (new features, refactors)
- Use semantic search to understand unfamiliar codebases
- Search for patterns when implementing similar functionality
- Keep different projects in separate BrAIny projects for isolation