NumPy for Numerical Computing
Learn NumPy arrays, vectorized operations, broadcasting, linear algebra, and random number generation for scientific Python.
NumPy is the foundation of the Python data science stack. Its N-dimensional arrays and vectorized operations are orders of magnitude faster than pure Python loops.
Installation and Basics
pip install numpy
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr.dtype) # int64
print(arr.shape) # (5,)
print(arr.ndim) # 1
Creating Arrays
np.zeros((3, 4)) # 3x4 array of zeros
np.ones((2, 3)) # 2x3 array of ones
np.full((2, 2), 7) # fill with 7
np.eye(3) # 3x3 identity matrix
np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5) # 5 evenly spaced values 0 to 1
np.random.rand(3, 3) # uniform random [0, 1)
np.random.randn(3, 3) # standard normal
np.random.randint(0, 100, size=(3, 3))
Array Operations — Vectorized
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])
a + b # [11, 22, 33, 44]
a * 2 # [2, 4, 6, 8]
a ** 2 # [1, 4, 9, 16]
np.sqrt(a) # element-wise sqrt
np.sin(a) # element-wise sin
No loops needed — operations apply to every element simultaneously.
Multi-Dimensional Arrays
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix.shape) # (3, 3)
print(matrix[1, 2]) # 6
print(matrix[:, 1]) # [2, 5, 8] — second column
print(matrix[1, :]) # [4, 5, 6] — second row
Broadcasting
NumPy automatically expands dimensions for operations:
matrix = np.array([[1, 2, 3], [4, 5, 6]]) # shape (2, 3)
row = np.array([10, 20, 30]) # shape (3,)
matrix + row # broadcasts row to each row of matrix
# [[11, 22, 33], [14, 25, 36]]
Aggregations
data = np.array([[1, 2, 3], [4, 5, 6]])
data.sum() # 21
data.mean() # 3.5
data.max() # 6
data.sum(axis=0) # [5, 7, 9] column sums
data.sum(axis=1) # [6, 15] row sums
data.std() # standard deviation
np.median(data) # median
Boolean Indexing
arr = np.array([1, 5, 3, 8, 2, 9, 4])
arr[arr > 4] # [5, 8, 9]
arr[(arr > 2) & (arr < 8)] # [5, 3, 4]
matrix = np.random.randint(0, 10, size=(4, 4))
matrix[matrix % 2 == 0] = 0 # zero out even values
Linear Algebra
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
np.dot(A, B) # matrix multiplication
A @ B # same (Python 3.5+)
np.linalg.inv(A) # inverse
np.linalg.det(A) # determinant
np.linalg.eig(A) # eigenvalues and eigenvectors
# Solve Ax = b
b = np.array([1, 2])
x = np.linalg.solve(A, b)
Reshaping
arr = np.arange(12)
arr.reshape(3, 4) # 3x4 matrix
arr.reshape(2, 3, 2) # 3D array
arr.flatten() # back to 1D
arr.T # transpose
Performance: NumPy vs Python
import time
size = 1_000_000
python_list = list(range(size))
numpy_arr = np.arange(size)
start = time.time()
result = [x ** 2 for x in python_list]
print(f"Python list: {time.time() - start:.4f}s")
start = time.time()
result = numpy_arr ** 2
print(f"NumPy array: {time.time() - start:.4f}s")
# NumPy is typically 10-100x faster
NumPy is the building block for Pandas, Scikit-learn, Matplotlib, and virtually all scientific Python libraries.
Data Types (dtype)
NumPy arrays are homogeneous — all elements share one type:
arr = np.array([1, 2, 3], dtype=np.float64)
arr_int = np.array([1.7, 2.3, 3.9], dtype=np.int32) # truncates to [1, 2, 3]
# Memory-efficient types for large datasets
big = np.zeros(1_000_000, dtype=np.float32) # half the memory of float64
Common dtypes: int32, int64, float32, float64, bool, complex128.
Advanced Indexing
arr = np.arange(10)
# Fancy indexing — select specific positions
arr[[0, 3, 7]] # [0, 3, 7]
matrix = np.arange(12).reshape(3, 4)
rows = [0, 2]
cols = [1, 3]
matrix[rows, cols] # elements at (0,1) and (2,3)
# np.where — conditional selection
arr = np.array([1, 5, 3, 8, 2])
np.where(arr > 4, arr, 0) # keep values > 4, else 0 → [0, 5, 0, 8, 0]
Stacking and Splitting
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.vstack([a, b]) # vertical stack → 2x3
np.hstack([a, b]) # horizontal → [1, 2, 3, 4, 5, 6]
np.concatenate([a, b])
big = np.arange(12)
np.split(big, 3) # three equal arrays of length 4
Saving and Loading Arrays
arr = np.random.rand(1000, 10)
# Binary format — fast, preserves dtype
np.save("data.npy", arr)
loaded = np.load("data.npy")
# Compressed
np.savez_compressed("data.npz", train=arr[:800], test=arr[800:])
# Text (human-readable, slower)
np.savetxt("data.csv", arr, delimiter=",")
text = np.loadtxt("data.csv", delimiter=",")
For production pipelines, prefer .npy/.npz over CSV for raw numeric arrays.
Universal Functions (ufuncs)
NumPy functions work element-wise on entire arrays:
np.add(a, b) # same as a + b
np.maximum(a, b) # element-wise max
np.clip(arr, 0, 10) # values below 0 → 0, above 10 → 10
np.log(arr + 1) # safe log (avoid log(0))
Common Pitfalls
- Views vs copies — slicing often returns a view; modifying it affects the original
- Wrong axis —
sum(axis=0)vsaxis=1behaves differently on 2D arrays - Mixing lists and arrays — convert early:
np.array(my_list) - Looping over rows — prefer vectorized operations; loops defeat NumPy’s purpose
Next: Pandas builds on NumPy for labeled tabular data.