Agent Skill
2/7/2026

cql-type-system-schema-handling

Implement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation.

P
pmcfadin
0GitHub Stars
1Views
npx skills add pmcfadin/cqlite

SKILL.md

Namecql-type-system-schema-handling
DescriptionImplement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation.

name: CQL Type System & Schema Handling description: Implement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation. allowed-tools: Read, Grep, Glob

CQL Type System & Schema Handling

This skill provides guidance on implementing Cassandra CQL type system with schema-provided deserialization.

When to Use This Skill

  • Implementing CQL type deserializers
  • Parsing collection types (list, set, map)
  • Handling User-Defined Types (UDTs)
  • Working with frozen vs non-frozen types
  • Tuple deserialization
  • Schema validation
  • Type-correct data generation

Core Principles

Schema-Provided Deserialization

Per PRD: schema passed in, not inferred

// Schema provides type information
fn deserialize_cell(
    data: &[u8],
    column_type: &CqlType,  // From schema
) -> Result<CqlValue>

Never try to infer type from data alone - always use schema.

CQL Type Categories

1. Primitive Types

Fixed-Size Primitives

  • boolean - 1 byte (0x00 or 0x01)
  • tinyint - 1 byte signed
  • smallint - 2 bytes signed, big-endian
  • int - 4 bytes signed, big-endian
  • bigint - 8 bytes signed, big-endian
  • float - 4 bytes IEEE 754
  • double - 8 bytes IEEE 754
  • date - 4 bytes (days since epoch)
  • time - 8 bytes (nanoseconds since midnight)

Variable-Size Primitives

  • text/varchar - UTF-8 encoded string
  • blob - raw bytes
  • ascii - ASCII-only string

Special Primitives

  • uuid/timeuuid - 16 bytes
  • inet - 4 bytes (IPv4) or 16 bytes (IPv6)
  • varint - variable-length big integer
  • decimal - scale (4 bytes) + unscaled varint
  • duration - months, days, nanoseconds (3 VInts)
  • timestamp - 8 bytes (milliseconds since Unix epoch)

2. Collection Types

See collections-and-udts.md for detailed format.

Collection Format:

[4 bytes: element_count (big-endian)]
[for each element:]
    [4 bytes: element_size (big-endian)]
    [bytes: element_data]

Types:

  • list<T> - Ordered, allows duplicates
  • set<T> - Unordered, no duplicates
  • map<K,V> - Key-value pairs

3. Tuple Types

Format:

[element_1_data]
[element_2_data]
...

No size prefix - elements serialized back-to-back. Each element uses its type's serialization.

4. User-Defined Types (UDTs)

Format:

[for each field in schema order:]
    [4 bytes: field_size (-1 for null, 0 for empty, >0 for data)]
    [if size > 0:]
        [bytes: field_data]

UDT schema defines field names and types.

5. Frozen vs Non-Frozen

Frozen types:

  • Serialized as single blob
  • Cannot update individual elements
  • Used in primary keys
  • Nested collections must be frozen

Non-frozen collections:

  • Can update individual elements
  • Only allowed at top level (not nested)
  • Uses tombstones for deletions

Type Deserialization Patterns

Zero-Copy Pattern

use bytes::Bytes;

fn deserialize_text(data: Bytes) -> Result<String> {
    // Zero-copy: validate UTF-8 then wrap
    let s = std::str::from_utf8(&data)?;
    Ok(s.to_string())  // Only copy if needed
}

fn deserialize_blob(data: Bytes) -> Result<Bytes> {
    // Zero-copy: just return the slice
    Ok(data)
}

Length-Prefixed Pattern

fn deserialize_length_prefixed(data: &[u8]) -> Result<(Bytes, &[u8])> {
    if data.len() < 4 {
        return Err(Error::NotEnoughBytes);
    }
    
    let size = i32::from_be_bytes([data[0], data[1], data[2], data[3]]);
    
    if size < 0 {
        return Ok((Bytes::new(), &data[4..]));  // Null
    }
    
    let size = size as usize;
    if data.len() < 4 + size {
        return Err(Error::NotEnoughBytes);
    }
    
    let value = Bytes::copy_from_slice(&data[4..4 + size]);
    let remaining = &data[4 + size..];
    
    Ok((value, remaining))
}

Collection Pattern

fn deserialize_list(
    data: &[u8],
    element_type: &CqlType,
) -> Result<Vec<CqlValue>> {
    let count = i32::from_be_bytes([data[0], data[1], data[2], data[3]]) as usize;
    let mut offset = 4;
    let mut elements = Vec::with_capacity(count);
    
    for _ in 0..count {
        let (element_data, remaining) = deserialize_length_prefixed(&data[offset..])?;
        let element = deserialize_value(&element_data, element_type)?;
        elements.push(element);
        offset = data.len() - remaining.len();
    }
    
    Ok(elements)
}

Schema Handling

Schema Sources

  1. Statistics.db: Serialization header with column definitions
  2. System tables: system_schema.tables, system_schema.columns
  3. CQL schema file: For test data generation

Schema Representation

struct TableSchema {
    keyspace: String,
    table: String,
    partition_keys: Vec<ColumnDef>,
    clustering_keys: Vec<ColumnDef>,
    regular_columns: Vec<ColumnDef>,
    static_columns: Vec<ColumnDef>,
}

struct ColumnDef {
    name: String,
    cql_type: CqlType,
}

enum CqlType {
    // Primitives
    Boolean,
    Int,
    BigInt,
    Text,
    Uuid,
    Timestamp,
    // ... more primitives
    
    // Collections
    List(Box<CqlType>),
    Set(Box<CqlType>),
    Map(Box<CqlType>, Box<CqlType>),
    
    // Complex
    Tuple(Vec<CqlType>),
    Udt(UdtDef),
    
    // Modifiers
    Frozen(Box<CqlType>),
}

PRD Alignment

Supports Milestone M1 (Core Reading Library):

  • All CQL types including collections & UDTs
  • Schema-provided deserialization (not inferred)
  • Zero-copy patterns where possible

Supports Milestone M5 (Write Support):

  • Type-correct serialization
  • Schema validation

Common Pitfalls

1. Inferring Types

Wrong: Look at data to guess type ✅ Right: Use schema to know type

2. Copying Unnecessarily

Wrong: Vec<u8> for every field ✅ Right: Bytes with zero-copy slicing

3. Ignoring Null Handling

Wrong: Assume all fields present ✅ Right: Check for null (-1 size prefix)

4. Frozen Semantics

Wrong: Try to update frozen collection elements ✅ Right: Replace entire frozen value

5. Nested Collections

Wrong: Allow non-frozen nested collections ✅ Right: Nested collections must be frozen

Type System References

Detailed specifications in:

Testing

Generate type-correct test data:

# Use test-data-management skill for Docker-based generation
cd test-data
./scripts/start-clean.sh
./scripts/generate.sh

Validate parsing against sstabledump:

sstabledump test-data/datasets/sstables/keyspace/table/*.db

Next Steps

When adding new type support:

  1. Add to CqlType enum
  2. Implement deserializer with zero-copy where possible
  3. Add serializer (for M5 write support)
  4. Create property tests with edge cases
  5. Generate test data with type
  6. Validate against sstabledump
Skills Info
Original Name:cql-type-system-schema-handlingAuthor:pmcfadin