name: transformers-convert description: "Use this skill when converting custom PyTorch models to Hugging Face Transformers format. Helps with: (1) Creating PretrainedConfig and PreTrainedModel classes, (2) Writing ImageProcessor/Tokenizer, (3) Compatibility testing, (4) Hub upload preparation. Use when the user wants to make their model compatible with transformers library."

Hugging Face Transformers Model Conversion

Convert custom PyTorch models to Hugging Face Transformers format while maintaining exact compatibility with the original implementation.

Overview

This skill provides a systematic workflow for transformers conversion:

Extract hardcoded values into PretrainedConfig
Create PreTrainedModel wrapper
Build ImageProcessor/Tokenizer
Test equivalence thoroughly
Prepare for Hub upload

Important: Use validation mode (parallel implementations) first to verify equivalence, then replace the original.

Conversion Workflow

Step 1: Analyze the Custom Model

Ask the user to specify:

Path to the custom model implementation
Model type (vision, text, multimodal)
Task (classification, segmentation, generation, etc.)
Validation mode or replacement mode

Then identify:

Model architecture and components
Input/output formats
Key hyperparameters and hardcoded values
Pretrained weights location
Preprocessing pipeline
Custom layers or modules

Step 2: Create PretrainedConfig Class

Key principle: Extract ALL hardcoded values from the model as configurable parameters.

Template:

from transformers import PretrainedConfig
from typing import List, Optional

class {ModelName}Config(PretrainedConfig):
    model_type = "{model_name}"

    def __init__(
        self,
        # Core architecture parameters
        hidden_dim: int = 128,
        num_layers: int = 4,

        # Input/output parameters
        image_size: int = 1024,
        num_channels: int = 3,
        num_labels: int = 1,

        # Component-specific parameters (extract from original)
        component_param: List[int] | None = None,

        **kwargs,
    ):
        super().__init__(**kwargs)

        self.hidden_dim = hidden_dim
        self.num_layers = num_layers
        self.image_size = image_size
        self.num_channels = num_channels
        self.num_labels = num_labels

        # Use default if not specified
        self.component_param = (
            component_param if component_param is not None else [1, 2, 4]
        )

Critical: Ensure default values match what the pretrained weights expect!

Step 3: Create PreTrainedModel Class

Template:

from transformers import PreTrainedModel
from transformers.modeling_outputs import SemanticSegmenterOutput

class {ModelName}ForTask(PreTrainedModel):
    config_class = {ModelName}Config

    def __init__(self, config: {ModelName}Config):
        super().__init__(config)
        self.config = config

        # Initialize layers using config parameters (no hardcoded values!)
        self.encoder = Encoder(
            hidden_dim=config.hidden_dim,
            num_layers=config.num_layers,
        )

    def forward(
        self,
        pixel_values: torch.FloatTensor,
        labels: Optional[torch.LongTensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple, SemanticSegmenterOutput]:
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        # Forward pass
        logits = self.encoder(pixel_values)

        # Calculate loss if needed
        loss = None
        if labels is not None:
            # Compute loss
            pass

        if not return_dict:
            output = (logits,)
            return ((loss,) + output) if loss is not None else output

        return SemanticSegmenterOutput(
            loss=loss,
            logits=logits,
            hidden_states=None,
            attentions=None,
        )

Step 4: Create ImageProcessor/Tokenizer

For vision models - Create ImageProcessor:

from transformers import BaseImageProcessor

class {ModelName}ImageProcessor(BaseImageProcessor):
    model_input_names = ["pixel_values"]

    def __init__(
        self,
        size: int = 1024,
        resample: str = "bilinear",
        do_normalize: bool = True,
        image_mean: List[float] | None = None,
        image_std: List[float] | None = None,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.size = size
        self.resample = resample
        self.do_normalize = do_normalize
        self.image_mean = image_mean if image_mean is not None else [0.485, 0.456, 0.406]
        self.image_std = image_std if image_std is not None else [0.229, 0.224, 0.225]

    def preprocess(
        self,
        images: ImageInput,
        return_tensors: Optional[Union[str, TensorType]] = None,
        **kwargs,
    ) -> BatchFeature:
        # Implement preprocessing matching original
        # Return BatchFeature with pixel_values

For text models - Create tokenizer configuration.

Step 5: Create Compatibility Tests

Critical: Always use the SAME preprocessed tensor for both models when comparing outputs.

import pytest
import torch

def test_preprocessing_matches():
    """Test that preprocessing is equivalent."""
    old_tensor = old_preprocessing(image)
    new_tensor = processor(image, return_tensors="pt")["pixel_values"][0]
    assert torch.allclose(old_tensor, new_tensor, atol=1e-6)

def test_single_image_output_matches():
    """Test that model outputs match."""
    # Load models
    old_model = OldModel()
    new_model = NewModel(NewConfig())
    new_model.load_state_dict(old_model.state_dict())

    # Prepare SAME input
    preprocessed = preprocess(image)

    with torch.no_grad():
        old_output = old_model(preprocessed)
        new_output = new_model(pixel_values=preprocessed)

    # Use 0.5% tolerance for numerical differences
    assert torch.allclose(old_output, new_output.logits, atol=5e-3, rtol=1e-2)

def test_batch_output_matches():
    """Test batch processing."""
    # Test with batch of images

def test_state_dict_compatible():
    """Test that weights can be loaded."""
    new_model.load_state_dict(old_model.state_dict())

Step 6: Create MODEL_CARD.md

Generate comprehensive model card following Hugging Face standards. Include:

Model description
Usage examples
Training details
Evaluation metrics
Citation information

See Hugging Face model card documentation for template.

Step 7: Create Hub Push Script

Critical: Register classes with register_for_auto_class() before pushing.

#!/usr/bin/env python3
from huggingface_hub import HfApi
from {module}.transformers import {ModelName}Config, {ModelName}ForTask, {ModelName}ImageProcessor

def main():
    # CRITICAL: Register for Auto* support
    {ModelName}Config.register_for_auto_class()
    {ModelName}ForTask.register_for_auto_class("AutoModel")
    {ModelName}ImageProcessor.register_for_auto_class("AutoImageProcessor")

    # Load original model
    original_model = OriginalModel()

    # Create transformers-compatible model
    config = {ModelName}Config()
    model = {ModelName}ForTask(config)
    model.load_state_dict(original_model.state_dict())

    # Save and push
    model.save_pretrained(local_dir)
    config.save_pretrained(local_dir)
    processor = {ModelName}ImageProcessor()
    processor.save_pretrained(local_dir)

    api = HfApi(token=token)
    api.create_repo(repo_id=repo_id, exist_ok=True)
    api.upload_folder(repo_id=repo_id, folder_path=local_dir)

Implementation Strategy

Validation Mode (Recommended First)

Create parallel implementations:

{project}/
├── {module}/
│   ├── original_model.py           # Existing
│   └── transformers/               # NEW - for validation
│       ├── __init__.py
│       ├── configuration_{model}.py
│       ├── modeling_{model}.py
│       └── processing_{model}.py
└── tests/
    └── test_transformers_compatibility.py

Workflow:

Create transformers/ package alongside original
Run compatibility tests
Debug any discrepancies
Once tests pass → proceed to replacement

Replacement Mode (After Validation)

Once equivalence is verified:

{project}/
└── {module}/
    ├── configuration_{model}.py    # Replaces original
    ├── modeling_{model}.py
    └── processing_{model}.py

Workflow:

Remove or archive original implementation
Move transformers/* files up one level
Update all imports
Update tests
Update documentation

Common Issues

For detailed troubleshooting, see references/common-pitfalls.md.

Quick reference:

Hardcoded values: Extract to config with matching defaults
Preprocessing mismatches: Use exact same pipeline and parameters
State dict keys: Keep layer names matching original
Test tolerance: Use 0.5% tolerance (5e-3) for numerical differences
Device handling: Use self.device from PreTrainedModel
Image size order: post_process_semantic_segmentation() expects (width, height)

Debugging Strategy

If outputs don't match:

Check preprocessing produces identical tensors
Check state dict loaded correctly
Step-by-step comparison of each layer
Check device placement
Check determinism with torch.manual_seed

See references/common-pitfalls.md for detailed debugging steps.

Project Learnings

After completing a conversion, add learnings to references/learnings.md.

This accumulates knowledge from each project to avoid repeating mistakes.

transformers-convert

SKILL.md