to navigate

to select

to close

On this page

NumPy for Numerical Computing

Learn NumPy arrays, vectorized operations, broadcasting, linear algebra, and random number generation for scientific Python.

NumPy is the foundation of the Python data science stack. Its N-dimensional arrays and vectorized operations are orders of magnitude faster than pure Python loops.

Installation and Basics

  pip install numpy

  import numpy as np

arr = np.array([1, 2, 3, 4, 5])
print(arr.dtype)    # int64
print(arr.shape)    # (5,)
print(arr.ndim)     # 1

Creating Arrays

  np.zeros((3, 4))          # 3x4 array of zeros
np.ones((2, 3))           # 2x3 array of ones
np.full((2, 2), 7)         # fill with 7
np.eye(3)                  # 3x3 identity matrix
np.arange(0, 10, 2)        # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5)       # 5 evenly spaced values 0 to 1
np.random.rand(3, 3)       # uniform random [0, 1)
np.random.randn(3, 3)      # standard normal
np.random.randint(0, 100, size=(3, 3))

Array Operations — Vectorized

  a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

a + b          # [11, 22, 33, 44]
a * 2          # [2, 4, 6, 8]
a ** 2         # [1, 4, 9, 16]
np.sqrt(a)     # element-wise sqrt
np.sin(a)      # element-wise sin

No loops needed — operations apply to every element simultaneously.

Multi-Dimensional Arrays

  matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix.shape)   # (3, 3)
print(matrix[1, 2])   # 6
print(matrix[:, 1])   # [2, 5, 8] — second column
print(matrix[1, :])   # [4, 5, 6] — second row

Broadcasting

NumPy automatically expands dimensions for operations:

  matrix = np.array([[1, 2, 3], [4, 5, 6]])  # shape (2, 3)
row = np.array([10, 20, 30])                # shape (3,)

matrix + row  # broadcasts row to each row of matrix
# [[11, 22, 33], [14, 25, 36]]

Aggregations

  data = np.array([[1, 2, 3], [4, 5, 6]])

data.sum()          # 21
data.mean()         # 3.5
data.max()          # 6
data.sum(axis=0)    # [5, 7, 9]  column sums
data.sum(axis=1)    # [6, 15]    row sums
data.std()          # standard deviation
np.median(data)     # median

Boolean Indexing

  arr = np.array([1, 5, 3, 8, 2, 9, 4])
arr[arr > 4]              # [5, 8, 9]
arr[(arr > 2) & (arr < 8)]  # [5, 3, 4]

matrix = np.random.randint(0, 10, size=(4, 4))
matrix[matrix % 2 == 0] = 0  # zero out even values

Linear Algebra

  A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

np.dot(A, B)         # matrix multiplication
A @ B              # same (Python 3.5+)
np.linalg.inv(A)     # inverse
np.linalg.det(A)     # determinant
np.linalg.eig(A)     # eigenvalues and eigenvectors

# Solve Ax = b
b = np.array([1, 2])
x = np.linalg.solve(A, b)

Reshaping

  arr = np.arange(12)
arr.reshape(3, 4)     # 3x4 matrix
arr.reshape(2, 3, 2)  # 3D array
arr.flatten()         # back to 1D
arr.T                 # transpose

Performance: NumPy vs Python

  import time

size = 1_000_000
python_list = list(range(size))
numpy_arr = np.arange(size)

start = time.time()
result = [x ** 2 for x in python_list]
print(f"Python list: {time.time() - start:.4f}s")

start = time.time()
result = numpy_arr ** 2
print(f"NumPy array: {time.time() - start:.4f}s")
# NumPy is typically 10-100x faster

NumPy is the building block for Pandas, Scikit-learn, Matplotlib, and virtually all scientific Python libraries.

Data Types (dtype)

NumPy arrays are homogeneous — all elements share one type:

  arr = np.array([1, 2, 3], dtype=np.float64)
arr_int = np.array([1.7, 2.3, 3.9], dtype=np.int32)  # truncates to [1, 2, 3]

# Memory-efficient types for large datasets
big = np.zeros(1_000_000, dtype=np.float32)  # half the memory of float64

Common dtypes: int32, int64, float32, float64, bool, complex128.

Advanced Indexing

  arr = np.arange(10)

# Fancy indexing — select specific positions
arr[[0, 3, 7]]           # [0, 3, 7]

matrix = np.arange(12).reshape(3, 4)
rows = [0, 2]
cols = [1, 3]
matrix[rows, cols]       # elements at (0,1) and (2,3)

# np.where — conditional selection
arr = np.array([1, 5, 3, 8, 2])
np.where(arr > 4, arr, 0)  # keep values > 4, else 0 → [0, 5, 0, 8, 0]

Stacking and Splitting

  a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

np.vstack([a, b])    # vertical stack → 2x3
np.hstack([a, b])    # horizontal → [1, 2, 3, 4, 5, 6]
np.concatenate([a, b])

big = np.arange(12)
np.split(big, 3)     # three equal arrays of length 4

Saving and Loading Arrays

  arr = np.random.rand(1000, 10)

# Binary format — fast, preserves dtype
np.save("data.npy", arr)
loaded = np.load("data.npy")

# Compressed
np.savez_compressed("data.npz", train=arr[:800], test=arr[800:])

# Text (human-readable, slower)
np.savetxt("data.csv", arr, delimiter=",")
text = np.loadtxt("data.csv", delimiter=",")

For production pipelines, prefer .npy/.npz over CSV for raw numeric arrays.

Universal Functions (ufuncs)

NumPy functions work element-wise on entire arrays:

  np.add(a, b)       # same as a + b
np.maximum(a, b)   # element-wise max
np.clip(arr, 0, 10)  # values below 0 → 0, above 10 → 10
np.log(arr + 1)    # safe log (avoid log(0))

Common Pitfalls

Views vs copies — slicing often returns a view; modifying it affects the original
Wrong axis — sum(axis=0) vs axis=1 behaves differently on 2D arrays
Mixing lists and arrays — convert early: np.array(my_list)
Looping over rows — prefer vectorized operations; loops defeat NumPy’s purpose

Next: Pandas builds on NumPy for labeled tabular data.

Python Ecosystem & Libraries

Navigate the Python ecosystem — data …

Working with Databases

Connect Python to SQLite and PostgreSQL, …

NumPy for Numerical Computing

Installation and Basics link

Creating Arrays link

Array Operations — Vectorized link

Multi-Dimensional Arrays link

Broadcasting link

Aggregations link

Boolean Indexing link

Linear Algebra link

Reshaping link

Performance: NumPy vs Python link

Data Types (dtype) link

Advanced Indexing link

Stacking and Splitting link

Saving and Loading Arrays link

Universal Functions (ufuncs) link

Common Pitfalls link