C
Python/Data/Lesson 26

Introduction to NumPy

1 hr·theory
This chapter
1/2
Python

Introduction to NumPy

🎯 After reading this lesson

After finishing this lesson, you will be able to confidently do the following three things.

  • ✅ Why Python became the standard language for AI/data
  • ✅ Setting up venv + requirements.txt for Python 3.x
  • ✅ Four built-in functions: print / input / type / dir

Keep the learning goals as a checklist, and close the lesson once you can answer all of them.

NumPy — Code + Output

NumPy = the standard for numerical computation. Arrays at C speed. PyTorch, TensorFlow, and Pandas are all built on NumPy.


1. Install + Create Arrays

bash
$ pip install numpy
python
import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a)             # [1 2 3 4 5]
print(a.shape)       # (5,)
print(a.dtype)       # int64

# Common creations
np.zeros(5)          # [0. 0. 0. 0. 0.]
np.ones((2, 3))      # Fill 2x3 with 1s
np.arange(0, 10, 2)  # [0 2 4 6 8]
np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1.]

2. Vector Operations — In One Line

python
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)         # [5 7 9]      ← Element-wise +
print(a * b)         # [4 10 18]    ← Element-wise *
print(a * 2)         # [2 4 6]      ← All elements × 2
print(a ** 2)        # [1 4 9]      ← Square
print(a.sum())       # 6
print(a.mean())      # 2.0

The biggest difference from a list — [1,2,3] + [4,5,6] = [1,2,3,4,5,6] (concatenation). NumPy performs mathematical operations.


3. 2D Arrays (Matrices)

python
A = np.array([[1, 2, 3],
              [4, 5, 6]])

print(A.shape)       # (2, 3)
print(A[0, 1])       # 2 (row 0, col 1)
print(A[:, 0])       # [1 4]  (all rows, col 0)
print(A.T)           # Transpose — [[1 4], [2 5], [3 6]]

4. Statistics

python
data = np.array([85, 92, 78, 90, 88])

print(data.mean())      # 86.6 (mean)
print(data.std())       # 5.16 (standard deviation)
print(data.max())       # 92
print(data.argmax())    # 1 (index of max value)
print(np.median(data))  # 88

5. Conditions — Boolean Indexing

python
a = np.array([1, -2, 3, -4, 5])

positives_only = a[a > 0]                    # [1 3 5]
a[a < 0] = 0                          # [-value → 0]
print(a)                              # [1 0 3 0 5]

6. Speed — Compared to Python Lists

python
import time

# Python list
lst = list(range(10_000_000))
start = time.time()
result = [x * 2 for x in lst]
print(f"list: {time.time()-start:.2f}s")     # approx 0.5s

# NumPy
arr = np.arange(10_000_000)
start = time.time()
result = arr * 2
print(f"numpy: {time.time()-start:.2f}s")    # approx 0.02s (25x faster)

One-Line Summary

np.array() + vector operations + boolean indexing + statistics = the starting point for data analysis.

💻 Getting Started with NumPy
# pip install numpy
import numpy as np

# Python list vs NumPy
python_list = [1, 2, 3, 4, 5]
numpy_array = np.array([1, 2, 3, 4, 5])

# Speed comparison (NumPy is much faster)
import time

size = 1000000
py_list = list(range(size))
np_arr = np.arange(size)

# Python list
start = time.time()
result = [x * 2 for x in py_list]
print(f"Python: {time.time() - start:.4f}s")

# NumPy array
start = time.time()
result = np_arr * 2
print(f"NumPy: {time.time() - start:.4f}s")

# NumPy advantages
# 1. Vectorized operations (no loops needed)
arr = np.array([1, 2, 3])
print(arr + 10)      # [11, 12, 13]
print(arr * 2)       # [2, 4, 6]
print(arr ** 2)      # [1, 4, 9]

# 2. Broadcasting
a = np.array([[1], [2], [3]])
b = np.array([10, 20, 30])
print(a + b)  # automatic expansion

💡 Key Points

1. Implemented in C, so extremely fast
2. Vectorization: operations without loops
3. Broadcasting: automatic size alignment

Python features concise, readable syntax and is used across a wide range of fields. As an interpreted language, it can be executed immediately in a REPL environment. It follows the PEP 8 coding style guide and supports automatic formatting with Black/autopep8. Type hints improve code readability and IDE support. Use pip for package management and venv/conda for virtual environment setup.

🐍 Try Running It — Introduction to NumPy

Try running the concepts above as actual code. The fastest way to learn is to change the values and see for yourself how they behave.
✏️ Python 코드
📟 Console output
▶ Press the Run button
🐍 Real Python via Pyodide — first run takes 3–5s to load

🤖 Try Asking AI Like This

Knowing the concepts in this lesson lets you give specific instructions to AI. Instead of a vague 'fix this,' make vocabulary-driven requests — that is where token savings begin.

  • 'Convert this for loop into a numpy vector operation'
  • 'Refactor this data cleaning into pandas method chaining'

Why This Reduces Tokens

When you don't know the concepts, even after receiving an AI response you have to ask 'What does that mean?' again. That follow-up question is what consumes tokens. Learn the concept once, and the conversation finishes in a single exchange.

Introduction to NumPy - Python