Agent Skill
2/7/2026

python-data-wrangling

Modern data wrangling with pandas and polars. Use this skill when working with tabular data, need to choose between pandas/polars, or want to write idiomatic data manipulation code. Covers method chaining, idiomatic operations, performance considerations, and migration between libraries.

J
justanesta
1GitHub Stars
1Views
npx skills add justanesta/claude-code-resources

SKILL.md

Namepython-data-wrangling
DescriptionModern data wrangling with pandas and polars. Use this skill when working with tabular data, need to choose between pandas/polars, or want to write idiomatic data manipulation code. Covers method chaining, idiomatic operations, performance considerations, and migration between libraries.

name: python-data-wrangling description: | Modern data wrangling with pandas and polars. Use this skill when working with tabular data, need to choose between pandas/polars, or want to write idiomatic data manipulation code. Covers method chaining, idiomatic operations, performance considerations, and migration between libraries.

Python Data Wrangling

Modern patterns for pandas and polars data manipulation.

Decision Matrix: Pandas vs Polars

FactorPandasPolarsWinner
Data size<1GB>1GB, especially >10GBPolars for large data
Query optimizationNoYes (lazy evaluation)Polars
Ecosystem integrationVast (sklearn, viz)GrowingPandas for ML/viz
API familiarityDataFrame standardRust-inspiredPandas for teams
PerformanceGoodExcellent (2-10x)Polars
Memory usageHigherLowerPolars

General guidance:

  • Use pandas when: <1GB data, heavy ML/viz integration, team familiarity critical
  • Use polars when: >1GB data, performance critical, greenfield projects

Modern Pandas Patterns

Method Chaining

Chain operations for readability

result = (
    df
    .assign(
        total=lambda x: x["price"] * x["quantity"],
        date=lambda x: pd.to_datetime(x["date"])
    )
    .query("total > 100")
    .sort_values("total", ascending=False)
    .groupby("category")
    .agg({"total": ["sum", "mean"]})
    .reset_index()
)

See pandas-method-chaining.md for:

  • Lambda vs direct assignment
  • Pipe with custom functions
  • Handling complex transformations

Idiomatic Operations

# Use .loc for explicit indexing
df.loc[df["score"] > 80, "grade"] = "A"

# Use .pipe() for custom transformations
result = df.pipe(normalize_columns).pipe(remove_duplicates)

# Use .assign() for new columns
df = df.assign(
    log_value=lambda x: np.log(x["value"]),
    is_high=lambda x: x["value"] > x["value"].median()
)

GroupBy Patterns

# Named aggregations (pandas 0.25+)
summary = df.groupby("category").agg(
    total_sales=("sales", "sum"),
    avg_sales=("sales", "mean"),
    num_transactions=("sales", "count")
)

See pandas-groupby-patterns.md for:

  • Window functions
  • Multiple grouping levels
  • Custom aggregations

Polars Patterns

Lazy Evaluation

Use lazy API for query optimization

import polars as pl

result = (
    pl.scan_csv("data.csv")  # Lazy
    .filter(pl.col("value") > 100)
    .group_by("category")
    .agg([
        pl.col("sales").sum().alias("total_sales"),
        pl.col("sales").mean().alias("avg_sales")
    ])
    .collect()  # Execute
)

See polars-lazy-evaluation.md for:

  • When lazy helps vs hurts
  • Streaming for huge data
  • Query plan inspection

Polars Expressions

Use expressions for vectorized operations

result = df.select([
    pl.col("name"),
    (pl.col("salary") * 1.1).alias("new_salary"),
    pl.when(pl.col("age") > 30)
      .then(pl.lit("senior"))
      .otherwise(pl.lit("junior"))
      .alias("level")
])

See polars-expressions.md for:

  • Expression composition
  • when().then().otherwise() patterns
  • List and struct operations

Migration Guide

Pandas → Polars

PandasPolarsNotes
df["col"]df["col"] or pl.col("col")Expressions preferred
df[df["x"] > 5]df.filter(pl.col("x") > 5)Method-based
df.groupby("x").agg({"y": "sum"})df.group_by("x").agg(pl.col("y").sum())Expression-based

See migration-pandas-polars.md for complete patterns.

Performance Tips

Pandas Performance

# Use vectorized operations
df["result"] = df["a"] + df["b"]  # Good

# Use categorical for low-cardinality columns
df["category"] = df["category"].astype("category")

# Use eval for complex expressions
df.eval("total = price * quantity", inplace=True)

Polars Performance

# Use scan instead of read for lazy
df = pl.scan_csv("data.csv")

# Use streaming for data larger than memory
result = df.collect(streaming=True)

Anti-Patterns to Avoid

AvoidUse Instead
df.iterrows()Vectorized operations
Chained indexing df["a"]["b"] = x.loc
Growing DataFrames in loopspd.concat() outside loop
Mixed types in columnsConsistent types

source: pandas user guide, polars documentation

Skills Info
Original Name:python-data-wranglingAuthor:justanesta