transformers-convert
Use this skill when converting custom PyTorch models to Hugging Face Transformers format. Helps with: (1) Creating PretrainedConfig and PreTrainedModel classes, (2) Writing ImageProcessor/Tokenizer, (3) Compatibility testing, (4) Hub upload preparation. Use when the user wants to make their model compatible with transformers library.
SKILL.md
| Name | transformers-convert |
| Description | Use this skill when converting custom PyTorch models to Hugging Face Transformers format. Helps with: (1) Creating PretrainedConfig and PreTrainedModel classes, (2) Writing ImageProcessor/Tokenizer, (3) Compatibility testing, (4) Hub upload preparation. Use when the user wants to make their model compatible with transformers library. |
name: transformers-convert description: "Use this skill when converting custom PyTorch models to Hugging Face Transformers format. Helps with: (1) Creating PretrainedConfig and PreTrainedModel classes, (2) Writing ImageProcessor/Tokenizer, (3) Compatibility testing, (4) Hub upload preparation. Use when the user wants to make their model compatible with transformers library."
Hugging Face Transformers Model Conversion
Convert custom PyTorch models to Hugging Face Transformers format while maintaining exact compatibility with the original implementation.
Overview
This skill provides a systematic workflow for transformers conversion:
- Extract hardcoded values into PretrainedConfig
- Create PreTrainedModel wrapper
- Build ImageProcessor/Tokenizer
- Test equivalence thoroughly
- Prepare for Hub upload
Important: Use validation mode (parallel implementations) first to verify equivalence, then replace the original.
Conversion Workflow
Step 1: Analyze the Custom Model
Ask the user to specify:
- Path to the custom model implementation
- Model type (vision, text, multimodal)
- Task (classification, segmentation, generation, etc.)
- Validation mode or replacement mode
Then identify:
- Model architecture and components
- Input/output formats
- Key hyperparameters and hardcoded values
- Pretrained weights location
- Preprocessing pipeline
- Custom layers or modules
Step 2: Create PretrainedConfig Class
Key principle: Extract ALL hardcoded values from the model as configurable parameters.
Template:
from transformers import PretrainedConfig
from typing import List, Optional
class {ModelName}Config(PretrainedConfig):
model_type = "{model_name}"
def __init__(
self,
# Core architecture parameters
hidden_dim: int = 128,
num_layers: int = 4,
# Input/output parameters
image_size: int = 1024,
num_channels: int = 3,
num_labels: int = 1,
# Component-specific parameters (extract from original)
component_param: List[int] | None = None,
**kwargs,
):
super().__init__(**kwargs)
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.image_size = image_size
self.num_channels = num_channels
self.num_labels = num_labels
# Use default if not specified
self.component_param = (
component_param if component_param is not None else [1, 2, 4]
)
Critical: Ensure default values match what the pretrained weights expect!
Step 3: Create PreTrainedModel Class
Template:
from transformers import PreTrainedModel
from transformers.modeling_outputs import SemanticSegmenterOutput
class {ModelName}ForTask(PreTrainedModel):
config_class = {ModelName}Config
def __init__(self, config: {ModelName}Config):
super().__init__(config)
self.config = config
# Initialize layers using config parameters (no hardcoded values!)
self.encoder = Encoder(
hidden_dim=config.hidden_dim,
num_layers=config.num_layers,
)
def forward(
self,
pixel_values: torch.FloatTensor,
labels: Optional[torch.LongTensor] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple, SemanticSegmenterOutput]:
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
# Forward pass
logits = self.encoder(pixel_values)
# Calculate loss if needed
loss = None
if labels is not None:
# Compute loss
pass
if not return_dict:
output = (logits,)
return ((loss,) + output) if loss is not None else output
return SemanticSegmenterOutput(
loss=loss,
logits=logits,
hidden_states=None,
attentions=None,
)
Step 4: Create ImageProcessor/Tokenizer
For vision models - Create ImageProcessor:
from transformers import BaseImageProcessor
class {ModelName}ImageProcessor(BaseImageProcessor):
model_input_names = ["pixel_values"]
def __init__(
self,
size: int = 1024,
resample: str = "bilinear",
do_normalize: bool = True,
image_mean: List[float] | None = None,
image_std: List[float] | None = None,
**kwargs,
):
super().__init__(**kwargs)
self.size = size
self.resample = resample
self.do_normalize = do_normalize
self.image_mean = image_mean if image_mean is not None else [0.485, 0.456, 0.406]
self.image_std = image_std if image_std is not None else [0.229, 0.224, 0.225]
def preprocess(
self,
images: ImageInput,
return_tensors: Optional[Union[str, TensorType]] = None,
**kwargs,
) -> BatchFeature:
# Implement preprocessing matching original
# Return BatchFeature with pixel_values
For text models - Create tokenizer configuration.
Step 5: Create Compatibility Tests
Critical: Always use the SAME preprocessed tensor for both models when comparing outputs.
import pytest
import torch
def test_preprocessing_matches():
"""Test that preprocessing is equivalent."""
old_tensor = old_preprocessing(image)
new_tensor = processor(image, return_tensors="pt")["pixel_values"][0]
assert torch.allclose(old_tensor, new_tensor, atol=1e-6)
def test_single_image_output_matches():
"""Test that model outputs match."""
# Load models
old_model = OldModel()
new_model = NewModel(NewConfig())
new_model.load_state_dict(old_model.state_dict())
# Prepare SAME input
preprocessed = preprocess(image)
with torch.no_grad():
old_output = old_model(preprocessed)
new_output = new_model(pixel_values=preprocessed)
# Use 0.5% tolerance for numerical differences
assert torch.allclose(old_output, new_output.logits, atol=5e-3, rtol=1e-2)
def test_batch_output_matches():
"""Test batch processing."""
# Test with batch of images
def test_state_dict_compatible():
"""Test that weights can be loaded."""
new_model.load_state_dict(old_model.state_dict())
Step 6: Create MODEL_CARD.md
Generate comprehensive model card following Hugging Face standards. Include:
- Model description
- Usage examples
- Training details
- Evaluation metrics
- Citation information
See Hugging Face model card documentation for template.
Step 7: Create Hub Push Script
Critical: Register classes with register_for_auto_class() before pushing.
#!/usr/bin/env python3
from huggingface_hub import HfApi
from {module}.transformers import {ModelName}Config, {ModelName}ForTask, {ModelName}ImageProcessor
def main():
# CRITICAL: Register for Auto* support
{ModelName}Config.register_for_auto_class()
{ModelName}ForTask.register_for_auto_class("AutoModel")
{ModelName}ImageProcessor.register_for_auto_class("AutoImageProcessor")
# Load original model
original_model = OriginalModel()
# Create transformers-compatible model
config = {ModelName}Config()
model = {ModelName}ForTask(config)
model.load_state_dict(original_model.state_dict())
# Save and push
model.save_pretrained(local_dir)
config.save_pretrained(local_dir)
processor = {ModelName}ImageProcessor()
processor.save_pretrained(local_dir)
api = HfApi(token=token)
api.create_repo(repo_id=repo_id, exist_ok=True)
api.upload_folder(repo_id=repo_id, folder_path=local_dir)
Implementation Strategy
Validation Mode (Recommended First)
Create parallel implementations:
{project}/
├── {module}/
│ ├── original_model.py # Existing
│ └── transformers/ # NEW - for validation
│ ├── __init__.py
│ ├── configuration_{model}.py
│ ├── modeling_{model}.py
│ └── processing_{model}.py
└── tests/
└── test_transformers_compatibility.py
Workflow:
- Create transformers/ package alongside original
- Run compatibility tests
- Debug any discrepancies
- Once tests pass → proceed to replacement
Replacement Mode (After Validation)
Once equivalence is verified:
{project}/
└── {module}/
├── configuration_{model}.py # Replaces original
├── modeling_{model}.py
└── processing_{model}.py
Workflow:
- Remove or archive original implementation
- Move transformers/* files up one level
- Update all imports
- Update tests
- Update documentation
Common Issues
For detailed troubleshooting, see references/common-pitfalls.md.
Quick reference:
- Hardcoded values: Extract to config with matching defaults
- Preprocessing mismatches: Use exact same pipeline and parameters
- State dict keys: Keep layer names matching original
- Test tolerance: Use 0.5% tolerance (5e-3) for numerical differences
- Device handling: Use
self.devicefrom PreTrainedModel - Image size order:
post_process_semantic_segmentation()expects(width, height)
Debugging Strategy
If outputs don't match:
- Check preprocessing produces identical tensors
- Check state dict loaded correctly
- Step-by-step comparison of each layer
- Check device placement
- Check determinism with torch.manual_seed
See references/common-pitfalls.md for detailed debugging steps.
Project Learnings
After completing a conversion, add learnings to references/learnings.md.
This accumulates knowledge from each project to avoid repeating mistakes.
Checklist
- Step 1: Model analyzed
- Architecture identified
- Hyperparameters extracted
- Preprocessing understood
- Step 2: Config class created
- All hardcoded values moved to config
- Defaults match pretrained weights
- Type hints added
- Step 3: Model class created
- Inherits from PreTrainedModel
- Uses only config parameters
- Forward signature matches conventions
- Step 4: Processor created
- Preprocessing matches original
- Returns appropriate format
- Step 5: Tests created
- Preprocessing test passes
- Single image test passes
- Batch test passes
- State dict test passes
- Step 6: MODEL_CARD.md created
- All sections filled
- Usage examples tested
- Citations included
- Step 7: Push script created
- register_for_auto_class() called
- Script tested locally
- Successfully pushed to Hub
- Cleanup (if doing replacement)
- Original implementation removed/archived
- Imports updated
- Documentation updated