NumPy is the foundation upon which almost all other Data Science libraries are built. It provides support for large, multi-dimensional arrays and matrices.
for loops), broadcasting, and linear algebra functions.Example:
Python
import numpy as np
# Creating a 2D array (Matrix)
matrix = np.array([[1, 2], [3, 4]])
# Element-wise operations (Vectorization)
result = matrix * 2 # Result: [[2, 4], [6, 8]]
# Calculating Mean
print(np.mean(matrix)) # Result: 2.5
If NumPy is the engine, Pandas is the dashboard. It introduces two primary data structures: the Series (1D) and the DataFrame (2D/Table).
NaN), merging/joining datasets, and “Group By” operations.Example:
Python
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
# Filtering data
seniors = df[df['Age'] > 28]
# Reading a CSV file
# df = pd.read_csv('data.csv')
Matplotlib is the “grandfather” of Python visualization. It provides low-level control over every element of a figure.
Example:
Python
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y, color='green', marker='o')
plt.title("Growth Over Time")
plt.xlabel("Days")
plt.ylabel("Sales")
plt.show()
Seaborn is built on top of Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Example:
Python
import seaborn as sns
# Load a built-in dataset
tips = sns.load_dataset("tips")
# Create a violin plot of total bill by day
sns.violinplot(x="day", y="total_bill", data=tips)
SciPy builds on NumPy to provide a large number of functions that operate on NumPy arrays and help with scientific programming.
scipy.optimize: For finding the minimum or maximum of a function.scipy.integrate: For solving differential equations.scipy.linalg: Advanced linear algebra.Example:
Python
from scipy import optimize
# Finding the minimum of a simple function: f(x) = x^2 + 10sin(x)
def f(x):
return x**2 + 10*np.sin(x)
result = optimize.minimize(f, x0=0)
print(result.x) # Finds the value of x that minimizes the function
While SciPy provides general mathematical tools, Statsmodels is specifically for statistical modeling.
Example:
Python
import statsmodels.api as sm
# Define variables
X = [1, 2, 3, 4, 5] # Independent variable
y = [2, 4, 5, 4, 5] # Dependent variable
# Add a constant (intercept)
X = sm.add_constant(X)
# Fit Ordinary Least Squares (OLS) model
model = sm.OLS(y, X).fit()
# Print detailed statistical report
print(model.summary())
| Library | Primary Purpose | Best Known For |
| NumPy | Math & Logic | Fast Arrays & Matrices |
| Pandas | Data Manipulation | DataFrames (Excel for Python) |
| Matplotlib | Basic Visualization | Total control over plots |
| Seaborn | Statistical Visualization | Beautiful, complex charts with less code |
| SciPy | Advanced Science/Math | Optimization, Integration, Signal Processing |
| Statsmodels | Statistics | Formal statistical tests & Regression summaries |