Agent Skill
2/7/2026

burn-backends

This skill should be used when the user asks about "Burn backend", "WGPU", "NdArray", "Candle", "LibTorch", "custom kernel", "CubeCL", "quantization", "WebAssembly", "no_std", or backend selection and extension.

J
johnzfitch
1GitHub Stars
1Views
npx skills add johnzfitch/burn-plugin

SKILL.md

Nameburn-backends
DescriptionThis skill should be used when the user asks about "Burn backend", "WGPU", "NdArray", "Candle", "LibTorch", "custom kernel", "CubeCL", "quantization", "WebAssembly", "no_std", or backend selection and extension.

name: burn-backends description: This skill should be used when the user asks about "Burn backend", "WGPU", "NdArray", "Candle", "LibTorch", "custom kernel", "CubeCL", "quantization", "WebAssembly", "no_std", or backend selection and extension. version: 0.1.0

Burn Backends

Knowledge for selecting, configuring, and extending Burn compute backends.

Available Backends

BackendUse CaseFeatures Flag
WGPUGPU, cross-platformwgpu
NdArrayCPU, testingndarray
CandleHugging Face ecosystemcandle
LibTorchPyTorch interoptch

Backend Selection

Choose in Cargo.toml:

[dependencies]
burn = { version = "0.16", features = ["wgpu"] }

Use in code:

use burn::backend::Wgpu;

type MyBackend = Wgpu;
// Or with autodiff:
type MyBackend = Autodiff<Wgpu>;

let device = WgpuDevice::default();

Device Configuration

Each backend has its device type:

// WGPU
let device = WgpuDevice::default(); // Auto-select GPU
let device = WgpuDevice::Cpu;       // Force CPU

// NdArray (CPU only)
let device = NdArrayDevice::Cpu;

// LibTorch
let device = LibTorchDevice::Cuda(0); // GPU 0
let device = LibTorchDevice::Cpu;

Backend Portability

Write backend-agnostic code with generics:

fn train<B: AutodiffBackend>(device: B::Device) {
    let model: Model<B> = ModelConfig::new().init(&device);
    // Training code works with any backend
}

Custom Kernels (CubeCL)

For performance-critical operations:

use burn_cube::prelude::*;

#[cube(launch)]
fn custom_kernel<F: Float>(input: &Tensor<F>, output: &mut Tensor<F>) {
    let idx = ABSOLUTE_POS;
    output[idx] = input[idx] * F::new(2.0);
}

Quantization

Reduce model size and improve inference speed:

use burn::module::Quantizer;

let quantizer = Quantizer::new(QuantizationScheme::Symmetric);
let quantized_model = quantizer.quantize(model);

WebAssembly Deployment

Burn supports WASM targets:

[dependencies]
burn = { version = "0.16", features = ["wgpu", "wasm-bindgen"] }

Build with:

wasm-pack build --target web

Additional Resources

Consult references/topic-map-backends.md for:

  • Detailed backend comparison
  • Custom WGPU kernel guide
  • Quantization schemes and calibration
  • No-std embedded deployment
Skills Info
Original Name:burn-backendsAuthor:johnzfitch