Python - Pandas

Overview

Estimated time: 35–50 minutes

Pandas provides powerful data structures for tabular data analysis: Series and DataFrame.

Learning Objectives

  • Load data from CSV/JSON and inspect DataFrames.
  • Select/filter with loc/iloc; perform joins and groupby aggregations.
  • Understand tidy data principles and common pitfalls.

Examples

import pandas as pd

df = pd.DataFrame({"name": ["Ada","Alan"], "score": [95, 88]})
print(df[df.score > 90][["name","score"]])
print(df.groupby("name").score.mean())

Guidance & Patterns

  • Use loc/iloc explicitly; avoid chained indexing.
  • Prefer vectorized operations; avoid row-wise apply when possible.

Best Practices

  • Keep columns typed correctly; parse dates on read.
  • Document assumptions; validate inputs and handle missing values.

Exercises

  1. Join two DataFrames on a key and compute summary stats.
  2. Reshape data with pivot/melt to achieve tidy form.