name: CQL Type System & Schema Handling description: Implement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation. allowed-tools: Read, Grep, Glob

CQL Type System & Schema Handling

This skill provides guidance on implementing Cassandra CQL type system with schema-provided deserialization.

When to Use This Skill

Implementing CQL type deserializers
Parsing collection types (list, set, map)
Handling User-Defined Types (UDTs)
Working with frozen vs non-frozen types
Tuple deserialization
Schema validation
Type-correct data generation

Core Principles

Schema-Provided Deserialization

Per PRD: schema passed in, not inferred

// Schema provides type information
fn deserialize_cell(
    data: &[u8],
    column_type: &CqlType,  // From schema
) -> Result<CqlValue>

Never try to infer type from data alone - always use schema.

CQL Type Categories

1. Primitive Types

Fixed-Size Primitives

boolean - 1 byte (0x00 or 0x01)
tinyint - 1 byte signed
smallint - 2 bytes signed, big-endian
int - 4 bytes signed, big-endian
bigint - 8 bytes signed, big-endian
float - 4 bytes IEEE 754
double - 8 bytes IEEE 754
date - 4 bytes (days since epoch)
time - 8 bytes (nanoseconds since midnight)

Variable-Size Primitives

text/varchar - UTF-8 encoded string
blob - raw bytes
ascii - ASCII-only string

Special Primitives

uuid/timeuuid - 16 bytes
inet - 4 bytes (IPv4) or 16 bytes (IPv6)
varint - variable-length big integer
decimal - scale (4 bytes) + unscaled varint
duration - months, days, nanoseconds (3 VInts)
timestamp - 8 bytes (milliseconds since Unix epoch)

2. Collection Types

See collections-and-udts.md for detailed format.

Collection Format:

[4 bytes: element_count (big-endian)]
[for each element:]
    [4 bytes: element_size (big-endian)]
    [bytes: element_data]

Types:

list<T> - Ordered, allows duplicates
set<T> - Unordered, no duplicates
map<K,V> - Key-value pairs

3. Tuple Types

Format:

[element_1_data]
[element_2_data]
...

No size prefix - elements serialized back-to-back. Each element uses its type's serialization.

4. User-Defined Types (UDTs)

Format:

[for each field in schema order:]
    [4 bytes: field_size (-1 for null, 0 for empty, >0 for data)]
    [if size > 0:]
        [bytes: field_data]

UDT schema defines field names and types.

5. Frozen vs Non-Frozen

Frozen types:

Serialized as single blob
Cannot update individual elements
Used in primary keys
Nested collections must be frozen

Non-frozen collections:

Can update individual elements
Only allowed at top level (not nested)
Uses tombstones for deletions

Type Deserialization Patterns

Zero-Copy Pattern

use bytes::Bytes;

fn deserialize_text(data: Bytes) -> Result<String> {
    // Zero-copy: validate UTF-8 then wrap
    let s = std::str::from_utf8(&data)?;
    Ok(s.to_string())  // Only copy if needed
}

fn deserialize_blob(data: Bytes) -> Result<Bytes> {
    // Zero-copy: just return the slice
    Ok(data)
}

Length-Prefixed Pattern

fn deserialize_length_prefixed(data: &[u8]) -> Result<(Bytes, &[u8])> {
    if data.len() < 4 {
        return Err(Error::NotEnoughBytes);
    }
    
    let size = i32::from_be_bytes([data[0], data[1], data[2], data[3]]);
    
    if size < 0 {
        return Ok((Bytes::new(), &data[4..]));  // Null
    }
    
    let size = size as usize;
    if data.len() < 4 + size {
        return Err(Error::NotEnoughBytes);
    }
    
    let value = Bytes::copy_from_slice(&data[4..4 + size]);
    let remaining = &data[4 + size..];
    
    Ok((value, remaining))
}

Collection Pattern

fn deserialize_list(
    data: &[u8],
    element_type: &CqlType,
) -> Result<Vec<CqlValue>> {
    let count = i32::from_be_bytes([data[0], data[1], data[2], data[3]]) as usize;
    let mut offset = 4;
    let mut elements = Vec::with_capacity(count);
    
    for _ in 0..count {
        let (element_data, remaining) = deserialize_length_prefixed(&data[offset..])?;
        let element = deserialize_value(&element_data, element_type)?;
        elements.push(element);
        offset = data.len() - remaining.len();
    }
    
    Ok(elements)
}

Schema Handling

Schema Sources

Statistics.db: Serialization header with column definitions
System tables: system_schema.tables, system_schema.columns
CQL schema file: For test data generation

Schema Representation

struct TableSchema {
    keyspace: String,
    table: String,
    partition_keys: Vec<ColumnDef>,
    clustering_keys: Vec<ColumnDef>,
    regular_columns: Vec<ColumnDef>,
    static_columns: Vec<ColumnDef>,
}

struct ColumnDef {
    name: String,
    cql_type: CqlType,
}

enum CqlType {
    // Primitives
    Boolean,
    Int,
    BigInt,
    Text,
    Uuid,
    Timestamp,
    // ... more primitives
    
    // Collections
    List(Box<CqlType>),
    Set(Box<CqlType>),
    Map(Box<CqlType>, Box<CqlType>),
    
    // Complex
    Tuple(Vec<CqlType>),
    Udt(UdtDef),
    
    // Modifiers
    Frozen(Box<CqlType>),
}

PRD Alignment

Supports Milestone M1 (Core Reading Library):

All CQL types including collections & UDTs
Schema-provided deserialization (not inferred)
Zero-copy patterns where possible

Supports Milestone M5 (Write Support):

Type-correct serialization
Schema validation

Common Pitfalls

1. Inferring Types

❌ Wrong: Look at data to guess type ✅ Right: Use schema to know type

2. Copying Unnecessarily

❌ Wrong: Vec<u8> for every field ✅ Right: Bytes with zero-copy slicing

3. Ignoring Null Handling

❌ Wrong: Assume all fields present ✅ Right: Check for null (-1 size prefix)

4. Frozen Semantics

❌ Wrong: Try to update frozen collection elements ✅ Right: Replace entire frozen value

5. Nested Collections

❌ Wrong: Allow non-frozen nested collections ✅ Right: Nested collections must be frozen

Type System References

Detailed specifications in:

cql-types-reference.md - Complete type catalog
collections-and-udts.md - Collection and UDT formats

Testing

Generate type-correct test data:

# Use test-data-management skill for Docker-based generation
cd test-data
./scripts/start-clean.sh
./scripts/generate.sh

Validate parsing against sstabledump:

sstabledump test-data/datasets/sstables/keyspace/table/*.db

Next Steps

When adding new type support:

Add to CqlType enum
Implement deserializer with zero-copy where possible
Add serializer (for M5 write support)
Create property tests with edge cases
Generate test data with type
Validate against sstabledump

Name	cql-type-system-schema-handling
Description	Implement and deserialize all CQL types including primitives (int, text, timestamp, uuid, varint, decimal), collections (list, set, map), tuples, UDTs (user-defined types), and frozen types. Use when working with CQL type deserialization, schema validation, collection parsing, UDT handling, or type-correct data generation.

cql-type-system-schema-handling

SKILL.md