C
Python/Data/Lesson 27

Introduction to Pandas

1 hr·theory
This chapter
2/2

Introduction to Pandas

🎯 What you'll be able to do after this lesson

After completing this lesson, you'll be able to confidently do the following 3 things.

  • ✅ Explain why Python became the standard language for AI and data
  • ✅ Set up venv + requirements.txt for Python 3.x
  • ✅ Use the 4 built-in functions: print / input / type / dir

Keep the learning goals as a checklist — close the lesson once you can answer all of them.

Pandas — Code + Output

Pandas = the standard for working with tables. DataFrame = Excel sheet. Read and write CSV, SQL, and JSON in one line.


1. Installation + Creating a DataFrame

bash
$ pip install pandas
python
import pandas as pd

df = pd.DataFrame({
    "Name": ["Hong Gil-dong", "Lee Mong-ryong", "Seong Chun-hyang"],
    "Age": [28, 30, 25],
    "Score": [85, 92, 78],
})
print(df)
#    Name  Age  Score
# 0  Hong Gil-dong  28   85
# 1  Lee Mong-ryong  30   92
# 2  Seong Chun-hyang  25   78

2. Reading and Writing CSV / Excel

python
# Reading
df = pd.read_csv("students.csv")
df = pd.read_excel("data.xlsx")
df = pd.read_json("data.json")

# Writing
df.to_csv("out.csv", index=False, encoding="utf-8-sig")    # Korean characters won't break
df.to_excel("out.xlsx", index=False)

3. Preview and Statistics

python
df.head(3)           # First 3 rows
df.tail(3)           # Last 3 rows
df.shape             # (3, 3)
df.columns           # Index(['Name', 'Age', 'Score'])
df.describe()        # Mean·std·min·max automatically
df.info()            # Column types·null counts

4. Selecting Columns and Rows

python
df["Name"]                   # Series (1 column)
df[["Name", "Score"]]         # DataFrame (multiple columns)

df.iloc[0]                   # Row 0 (position-based)
df.iloc[0:2]                 # Rows 0-1
df.loc[df["Age"] >= 28]     # Condition (age 28 or older)

5. Adding and Modifying Data

python
# New column
df["Grade"] = df["Score"].apply(lambda x: "A" if x >= 90 else "B")

# Bulk column update
df["Age"] = df["Age"] + 1     # All +1

# Add row
df.loc[len(df)] = ["Byeon Hak-do", 35, 60, "F"]
print(df)

6. Group Aggregation

python
df = pd.DataFrame({
    "Region": ["Seoul", "Busan", "Seoul", "Busan"],
    "Sales": [100, 50, 150, 80],
})

# Sum by region
print(df.groupby("Region")["Sales"].sum())
# Region
# Busan   130
# Seoul   250

# Multiple statistics
print(df.groupby("Region")["Sales"].agg(["sum", "mean", "count"]))

7. Sorting and Missing Values

python
df.sort_values("Score", ascending=False)        # Score descending
df.drop_duplicates()                            # Remove duplicates
df.dropna()                                     # Remove null rows
df.fillna(0)                                    # Fill nulls → 0

One-line Summary

read_csv + groupby.sum/mean + loc[condition] + to_excel — these 4 cover 90% of analysis.

💻 Pandas Basics
import pandas as pd
import numpy as np

# Series
s = pd.Series([10, 20, 30], index=['a','b','c'])
print(s.dtype, s.shape)

# DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'score': [85.5, 90.0, 78.5]
})

# Basic info
print(df.shape)      # (3, 3)
print(df.dtypes)     # Data types by column
print(df.info())     # Includes missing values
print(df.describe()) # Descriptive statistics

# Read/Write CSV
df.to_csv('data.csv', index=False, encoding='utf-8-sig')
df = pd.read_csv('data.csv')

# Excel
df.to_excel('data.xlsx', index=False)
df = pd.read_excel('data.xlsx')

💡 💡 Pandas 2.0 Copy-on-Write

Enabling Copy-on-Write in Pandas 2.0:

pd.options.mode.copy_on_write = True

Modifying a view will no longer affect the original. SettingWithCopyWarning disappears!

Python is used across a wide range of fields thanks to its concise, readable syntax. As an interpreted language, it can be executed immediately in a REPL environment. It follows the PEP 8 coding style guide, with automatic formatting via Black/autopep8. Type hints improve code readability and IDE support. Packages are managed with pip, and virtual environments are set up with venv/conda.

🐍 Try it out — Introduction to Pandas

Run the concepts above in actual code. The fastest way to learn is to change the values and see how the behavior changes firsthand.
✏️ Python 코드
📟 Console output
▶ Press the Run button
🐍 Real Python via Pyodide — first run takes 3–5s to load

🤖 Try asking AI like this

Knowing the concepts in this lesson lets you give AI specific instructions. Instead of a vague 'fix this,' a request with precise vocabulary — that's where token savings begin.

  • 'Convert this for loop to a numpy vector operation'
  • 'Refactor this data cleaning step using pandas method chaining'

Why this reduces tokens

Without knowing the concepts, even after receiving an AI response you'll need to ask 'What does that mean?' again. That follow-up question is what eats up tokens. Learn the concept once, and the conversation ends in a single round.

Read this first: Introduction to NumPy
Up next: Vibe Coding
Introduction to Pandas - Python