Python

10 Pandas Tricks That Will Save You Hours

May 14, 20264 min readby StartD Editorial
10 Pandas Tricks That Will Save You Hours

Pandas is so flexible that most people use 10% of it badly. These ten patterns will clean up almost any notebook.

1. Method chaining beats temporary variables

# Instead of:
df2 = df[df.country == "BR"]
df2["revenue_k"] = df2.revenue / 1000
result = df2.groupby("city")["revenue_k"].sum()

# Do:
result = (
    df.query("country == 'BR'")
      .assign(revenue_k=lambda d: d.revenue / 1000)
      .groupby("city")["revenue_k"]
      .sum()
)

2. query() is faster to read than boolean masks

df.query("age > 30 and country in ['BR', 'PT']")

3. assign() for new columns inside a chain

df.assign(
    margin=lambda d: d.revenue - d.cost,
    margin_pct=lambda d: d.margin / d.revenue,
)

4. pipe() for custom steps

def winsorize(df, col, q=0.01):
    lo, hi = df[col].quantile([q, 1 - q])
    return df.assign(**{col: df[col].clip(lo, hi)})

df.pipe(winsorize, "revenue")

5–10: the rest

  • Use .loc[] for assignment, never chained indexing.
  • Categorical dtype saves memory and speeds up groupby.
  • value_counts(normalize=True) gives you proportions in one call.
  • merge() with indicator=True debugs join mismatches instantly.
  • convert_dtypes() handles nullable ints and clean booleans.
  • Profile with df.memory_usage(deep=True) before optimizing anything.

Adopt these and your future self (and your reviewers) will thank you.


Recommended Reading

Python for Data Analysis

Python for Data Analysis

Wes McKinney (3rd Edition, O'Reilly)

The definitive guide to pandas, NumPy, and the modern Python data stack — written by the creator of pandas himself.

View on Amazon
Hands-On Machine Learning

Hands-On Machine Learning

Aurélien Géron (3rd Edition, O'Reilly)

From linear regression to deep neural nets with Scikit-Learn, Keras and TensorFlow. The most recommended ML book of the decade.

View on Amazon