Programming Foundations (Python for AI & Machine Learning)

Python is the backbone of modern AI and Machine Learning. Almost every major AI framework, research paper implementation, and production ML system uses Python because of its simplicity, flexibility, and powerful ecosystem.
This module builds strong programming foundations, focusing on how Python is actually used in AI/ML projects, not just syntax.

1. Python Basics for Machine Learning

Python is an interpreted, high-level language that allows rapid experimentation — a critical requirement in ML where models are built, tested, and improved continuously.

Why Python is Preferred in AI/ML

Simple, readable syntax
Massive AI/ML libraries
Strong community support
Fast prototyping
Easy integration with C/C++ for performance

Basic Python Concepts (ML Context)

Variables store data used in training and prediction.

Example:

learning_rate = 0.01
epochs = 100

Python does not require explicit type declaration, which speeds up experimentation.

2. Data Types & Control Flow

ML code deals with large volumes of data, so understanding data types and flow control is critical.

Core Data Types

Integers & Floats
Used for numerical features, weights, loss values.

age = 25
accuracy = 0.92

Strings
Used in labels, file paths, class names.

label = "spam"

Booleans
Used in conditions and flags.

is_trained = True

Collections (Extremely Important for ML)

Lists
Used to store datasets, predictions.

scores = [0.85, 0.90, 0.88]

Tuples
Used for immutable configurations.

input_shape = (224, 224, 3)

Dictionaries
Used for feature mapping and configurations.

params = {"lr": 0.01, "epochs": 50}

Sets
Used for unique values.

Control Flow

Conditional Statements
Used for decision-making.

if accuracy > 0.9:
    print("Model is performing well")

Loops
Used for training epochs and batch processing.

for epoch in range(epochs):
    train_model()

3. Functions & Modules

Functions allow reusability and abstraction, which is crucial in ML pipelines.

Functions

A function represents a single logical operation.

def calculate_accuracy(y_true, y_pred):
    correct = sum(y_true[i] == y_pred[i] for i in range(len(y_true)))
    return correct / len(y_true)

Good ML functions:

Do one task
Have clear inputs and outputs
Avoid hard-coded values

Modules

Modules help organize large ML projects.

Example file structure:

project/
 ├── data_loader.py
 ├── model.py
 ├── train.py
 ├── evaluate.py

Usage:

from data_loader import load_data

4. Object-Oriented Programming (OOP) Basics in Python

OOP helps structure complex ML systems like models, datasets, and pipelines.

Why OOP in ML?

Encapsulates model behavior
Makes code scalable
Easier experimentation

Class Example (ML Model Skeleton)

class LinearRegressionModel:
    def __init__(self, lr):
        self.lr = lr
        self.weights = None

    def train(self, X, y):
        pass

    def predict(self, X):
        pass

Key OOP Concepts:

Class
Object
Constructor
Methods
Encapsulation

5. File Handling

ML systems rely heavily on datasets stored in files.

Reading Files

with open("data.txt", "r") as file:
    data = file.readlines()

Writing Files

with open("results.txt", "w") as file:
    file.write("Accuracy: 92%")

Use cases:

Loading datasets
Saving model outputs
Logging training progress

6. Exception Handling

ML code often fails due to:

Missing files
Invalid data
Shape mismatches

Exception handling prevents crashes.

Example

try:
    data = load_data("dataset.csv")
except FileNotFoundError:
    print("Dataset file not found")
except Exception as e:
    print("Unexpected error:", e)

Good practice:

Catch specific exceptions
Provide meaningful error messages

7. Virtual Environments

AI projects depend on specific library versions.

Virtual environments isolate dependencies.

Why Virtual Environments?

Avoid version conflicts
Reproducible experiments
Clean project setup

Creating Virtual Environment

python -m venv venv
source venv/bin/activate   # Linux/Mac
venv\Scripts\activate      # Windows

Installing Libraries

pip install numpy pandas scikit-learn

8. Writing Clean, Modular ML Code

Clean code is critical for collaboration and debugging.

Best Practices

Use meaningful variable names
Avoid hard-coded values
Keep functions small
Separate logic into files
Add comments where logic is complex

Example: Clean ML Function

def train_model(X, y, epochs, lr):
    for epoch in range(epochs):
        loss = compute_loss(X, y)
        update_weights(lr)

Project Structure (Professional Standard)

ml_project/
 ├── data/
 ├── notebooks/
 ├── src/
 │    ├── preprocessing.py
 │    ├── model.py
 │    ├── train.py
 │    └── utils.py
 ├── requirements.txt
 └── README.md

All codes :

Module 1: Python Basics for Machine Learning

1.2 Setting Up Python Environment

# Check Python version (3.8+ recommended for ML)
python --version

# Using pip (Python package manager)
pip install numpy pandas matplotlib scikit-learn

# Check installed packages
pip list

1.3 Python Interpreter and Scripts

# Interactive mode (REPL - Read-Eval-Print Loop)
>>> 2 + 2
4
>>> print("Hello ML!")
Hello ML!

# Script mode (save as script.py and run)
# python script.py

1.4 Comments and Documentation

# Single-line comment

"""
Multi-line comment or docstring
Used for documentation
"""

def train_model(data):
    """
    Trains a machine learning model.
    
    Args:
        data: Training dataset
    
    Returns:
        Trained model object
    """
    pass

Module 2: Data Types & Control Flow

2.1 Basic Data Types

Numbers:

# Integers
num_samples = 1000
num_features = 784  # For MNIST dataset

# Floats (essential for ML calculations)
learning_rate = 0.001
accuracy = 0.95

# Complex numbers (used in signal processing)
z = 3 + 4j

# Type conversion
x = int(3.7)      # 3
y = float(5)      # 5.0
z = str(100)      # "100"

Strings:

# String creation
model_name = "Random Forest"
dataset = 'CIFAR-10'
description = """Multi-line string
for longer descriptions"""

# String operations (useful for file paths, logging)
path = "/data/train/"
filename = "model_v1.pkl"
full_path = path + filename  # Concatenation

# String formatting (for logging results)
epoch = 10
loss = 0.234
print(f"Epoch {epoch}: Loss = {loss:.4f}")
# Output: Epoch 10: Loss = 0.2340

# String methods
text = "machine learning"
print(text.upper())        # MACHINE LEARNING
print(text.split())        # ['machine', 'learning']
print(text.replace("machine", "deep"))  # deep learning

Booleans:

# Boolean values (for flags and conditions)
is_training = True
has_converged = False

# Boolean operations
model_ready = (is_training and not has_converged)

# Comparison operators
accuracy > 0.9
loss <= 0.1
epoch != max_epochs

2.2 Data Structures

Lists (Most versatile, like arrays):

# Creating lists
features = [1, 2, 3, 4, 5]
mixed_data = [1, "feature", 3.14, True]

# Indexing (0-based)
first_feature = features[0]      # 1
last_feature = features[-1]      # 5

# Slicing (very important for ML data manipulation)
first_three = features[0:3]      # [1, 2, 3]
last_two = features[-2:]         # [4, 5]
every_second = features[::2]     # [1, 3, 5]

# List operations
features.append(6)               # Add to end
features.insert(0, 0)           # Insert at position
features.remove(3)              # Remove by value
popped = features.pop()         # Remove and return last

# List comprehension (powerful for data processing)
squared = [x**2 for x in features]
filtered = [x for x in features if x > 2]

# Practical ML example
train_indices = [i for i in range(len(dataset)) if i % 5 != 0]
test_indices = [i for i in range(len(dataset)) if i % 5 == 0]

Tuples (Immutable, good for fixed data):

# Creating tuples
image_shape = (28, 28, 3)  # Height, Width, Channels
train_val_split = (0.8, 0.2)

# Unpacking (very useful)
height, width, channels = image_shape
train_ratio, val_ratio = train_val_split

# Tuples are immutable
# image_shape[0] = 32  # This would raise an error

# Use case: returning multiple values from functions
def get_data_stats(data):
    return len(data), data.mean(), data.std()

size, mean, std = get_data_stats(dataset)

Dictionaries (Key-value pairs, essential for configs):

# Creating dictionaries
model_config = {
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 100,
    'optimizer': 'Adam'
}

# Accessing values
lr = model_config['learning_rate']
optimizer = model_config.get('optimizer', 'SGD')  # With default

# Adding/modifying
model_config['momentum'] = 0.9
model_config['learning_rate'] = 0.0001  # Update

# Dictionary methods
print(model_config.keys())      # Get all keys
print(model_config.values())    # Get all values
print(model_config.items())     # Get key-value pairs

# Iterating
for key, value in model_config.items():
    print(f"{key}: {value}")

# Dictionary comprehension
squared_dict = {x: x**2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# Nested dictionaries (for complex configs)
experiment_config = {
    'model': {
        'type': 'CNN',
        'layers': [64, 128, 256]
    },
    'training': {
        'epochs': 50,
        'batch_size': 32
    }
}

# Accessing nested values
model_type = experiment_config['model']['type']

Sets (Unique elements, useful for data cleaning):

# Creating sets
unique_labels = {0, 1, 2, 3, 4}
another_set = set([1, 2, 2, 3, 3, 3])  # {1, 2, 3}

# Set operations
train_ids = {1, 2, 3, 4, 5}
val_ids = {4, 5, 6, 7}

intersection = train_ids & val_ids  # {4, 5} (overlap)
union = train_ids | val_ids         # All unique IDs
difference = train_ids - val_ids    # {1, 2, 3}

# Use case: finding unique classes
all_predictions = [0, 1, 1, 2, 0, 3, 2, 1]
unique_classes = set(all_predictions)  # {0, 1, 2, 3}

2.3 Control Flow

If-Else Statements:

# Basic if-else
accuracy = 0.95

if accuracy > 0.9:
    print("Excellent model!")
elif accuracy > 0.7:
    print("Good model")
else:
    print("Needs improvement")

# Ternary operator (one-liner)
status = "Pass" if accuracy > 0.8 else "Fail"

# ML example: Early stopping logic
def check_early_stopping(val_loss, best_loss, patience_counter, patience):
    if val_loss < best_loss:
        best_loss = val_loss
        patience_counter = 0
        print("New best model!")
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print("Early stopping triggered")
            return True
    return False

Loops:

For loops (iterating over sequences):

# Basic for loop
epochs = 10
for epoch in range(epochs):
    print(f"Training epoch {epoch + 1}")

# Iterating over lists
features = ['age', 'income', 'education']
for feature in features:
    print(f"Processing feature: {feature}")

# Enumerate (get index and value)
for idx, feature in enumerate(features):
    print(f"Feature {idx}: {feature}")

# Range with start, stop, step
for i in range(0, 100, 10):  # 0, 10, 20, ..., 90
    print(f"Processing batch {i}")

# Nested loops (training loop example)
for epoch in range(num_epochs):
    for batch_idx, (data, labels) in enumerate(train_loader):
        # Training logic here
        if batch_idx % 10 == 0:
            print(f"Epoch {epoch}, Batch {batch_idx}")

# List comprehension (faster alternative)
squared_features = [x**2 for x in range(10)]

While loops (condition-based):

# Basic while loop
loss = 1.0
epoch = 0
max_epochs = 1000

while loss > 0.01 and epoch < max_epochs:
    # Training step
    loss = loss * 0.95  # Simulated decrease
    epoch += 1
    print(f"Epoch {epoch}: Loss = {loss:.4f}")

# While with break
while True:
    user_input = input("Continue training? (y/n): ")
    if user_input.lower() == 'n':
        break
    # Continue training

# While with continue
batch_idx = 0
while batch_idx < len(dataset):
    if dataset[batch_idx] is None:
        batch_idx += 1
        continue  # Skip this batch
    # Process batch
    batch_idx += 1

Loop Control:

# Break: Exit loop immediately
for epoch in range(100):
    train_loss = train_model()
    if train_loss < threshold:
        print(f"Converged at epoch {epoch}")
        break

# Continue: Skip to next iteration
for sample_id in range(len(dataset)):
    if is_corrupted(dataset[sample_id]):
        continue  # Skip corrupted data
    process(dataset[sample_id])

# Pass: Placeholder (do nothing)
for epoch in range(num_epochs):
    pass  # TODO: Implement training loop

2.4 Type Hints (Python 3.5+, important for clean ML code)

from typing import List, Dict, Tuple, Optional

def preprocess_data(
    data: List[float],
    labels: List[int],
    normalize: bool = True
) -> Tuple[List[float], List[int]]:
    """
    Preprocesses data with type hints for clarity.
    """
    if normalize:
        mean = sum(data) / len(data)
        data = [(x - mean) for x in data]
    return data, labels

# Optional types (can be None)
def load_model(path: Optional[str] = None) -> Dict:
    if path is None:
        return {}  # Return empty config
    # Load from path

# Complex types
DataPoint = Tuple[List[float], int]  # (features, label)
Dataset = List[DataPoint]

def split_dataset(
    dataset: Dataset,
    ratio: float = 0.8
) -> Tuple[Dataset, Dataset]:
    split_idx = int(len(dataset) * ratio)
    return dataset[:split_idx], dataset[split_idx:]

Module 3: Functions & Modules

3.1 Functions Basics

Defining Functions:

# Basic function
def train_epoch(model, data, labels):
    """Trains model for one epoch."""
    # Training logic
    loss = 0.0
    # ... compute loss
    return loss

# Function with default arguments
def create_model(
    input_size: int,
    hidden_size: int = 128,
    output_size: int = 10,
    activation: str = 'relu'
):
    """Creates neural network with defaults."""
    model = {
        'input': input_size,
        'hidden': hidden_size,
        'output': output_size,
        'activation': activation
    }
    return model

# Usage
model1 = create_model(784)  # Uses defaults
model2 = create_model(784, hidden_size=256, activation='tanh')

Return Values:

# Single return
def calculate_accuracy(predictions, labels):
    correct = sum([p == l for p, l in zip(predictions, labels)])
    return correct / len(labels)

# Multiple returns (as tuple)
def evaluate_model(model, test_data, test_labels):
    predictions = model.predict(test_data)
    accuracy = calculate_accuracy(predictions, test_labels)
    loss = calculate_loss(predictions, test_labels)
    return accuracy, loss, predictions

# Unpacking returns
acc, loss, preds = evaluate_model(model, X_test, y_test)

# Named returns using dictionary
def get_metrics(predictions, labels):
    return {
        'accuracy': calculate_accuracy(predictions, labels),
        'precision': calculate_precision(predictions, labels),
        'recall': calculate_recall(predictions, labels)
    }

metrics = get_metrics(preds, labels)
print(f"Accuracy: {metrics['accuracy']}")

Function Arguments:

# Positional arguments
def train(model, data, labels, epochs):
    pass

train(my_model, X_train, y_train, 100)

# Keyword arguments (more readable for ML)
train(
    model=my_model,
    data=X_train,
    labels=y_train,
    epochs=100
)

# *args - Variable number of positional arguments
def ensemble_predict(*models):
    """Combines predictions from multiple models."""
    predictions = []
    for model in models:
        pred = model.predict()
        predictions.append(pred)
    return average(predictions)

result = ensemble_predict(model1, model2, model3)

# **kwargs - Variable number of keyword arguments
def configure_model(**config):
    """Flexible model configuration."""
    learning_rate = config.get('learning_rate', 0.001)
    batch_size = config.get('batch_size', 32)
    optimizer = config.get('optimizer', 'Adam')
    
    print(f"LR: {learning_rate}, Batch: {batch_size}, Opt: {optimizer}")

configure_model(learning_rate=0.01, momentum=0.9, weight_decay=0.0001)

# Combining all argument types
def train_model(model, data, *regularizers, epochs=10, **hyperparams):
    """
    model: positional
    data: positional
    *regularizers: variable positional
    epochs: keyword with default
    **hyperparams: variable keyword
    """
    pass

Lambda Functions (Anonymous functions):

# Basic lambda
square = lambda x: x**2
print(square(5))  # 25

# Lambda with multiple arguments
multiply = lambda x, y: x * y
print(multiply(3, 4))  # 12

# Common use: sorting by custom key
data_points = [(1, 0.5), (2, 0.3), (3, 0.9)]
sorted_by_loss = sorted(data_points, key=lambda x: x[1])
# [(2, 0.3), (1, 0.5), (3, 0.9)]

# Lambda with map/filter
features = [1, 2, 3, 4, 5]
normalized = list(map(lambda x: (x - 3) / 2, features))
filtered = list(filter(lambda x: x > 2, features))

# ML example: applying transformation
def apply_activation(values, activation='relu'):
    activations = {
        'relu': lambda x: max(0, x),
        'sigmoid': lambda x: 1 / (1 + math.exp(-x)),
        'tanh': lambda x: math.tanh(x)
    }
    return [activations[activation](v) for v in values]

3.2 Scope and Closures

# Global vs Local scope
learning_rate = 0.001  # Global

def train():
    loss = 0.0  # Local to train()
    print(learning_rate)  # Can access global
    # print(batch_loss)  # Error: not defined here

def validate():
    batch_loss = 0.5  # Local to validate()
    # print(loss)  # Error: not accessible

# Modifying global variables
counter = 0

def increment():
    global counter  # Explicitly declare as global
    counter += 1

# Closures (functions that remember enclosing scope)
def create_trainer(learning_rate):
    """Returns a training function with fixed learning rate."""
    def train(model, loss):
        # This function "closes over" learning_rate
        model.update(loss * learning_rate)
    return train

# Create specialized trainers
fast_trainer = create_trainer(0.01)
slow_trainer = create_trainer(0.0001)

# Practical example: Creating preprocessors
def create_normalizer(mean, std):
    """Factory function for normalization."""
    def normalize(value):
        return (value - mean) / std
    return normalize

# Create normalizer for specific dataset
dataset_mean = 0.5
dataset_std = 0.2
normalizer = create_normalizer(dataset_mean, dataset_std)

# Use it
normalized_value = normalizer(0.7)

3.3 Decorators (Advanced but useful for ML)

import time
import functools

# Basic decorator: timing function execution
def timer(func):
    """Decorator to measure execution time."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        print(f"{func.__name__} took {end_time - start_time:.4f} seconds")
        return result
    return wrapper

@timer
def train_model(epochs):
    """Training takes time..."""
    time.sleep(2)  # Simulated training
    return "Model trained"

result = train_model(100)

# Logging decorator
def log_execution(func):
    """Logs function calls."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__} with args={args}, kwargs={kwargs}")
        result = func(*args, **kwargs)
        print(f"{func.__name__} returned {result}")
        return result
    return wrapper

@log_execution
def predict(model, data):
    return model.forward(data)

# Stacking decorators
@timer
@log_execution
def train_epoch(model, data):
    pass  # Training logic

# Decorator with arguments
def validate_inputs(min_value, max_value):
    """Decorator factory."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(value):
            if not (min_value <= value <= max_value):
                raise ValueError(f"Value must be between {min_value} and {max_value}")
            return func(value)
        return wrapper
    return decorator

@validate_inputs(0.0, 1.0)
def set_learning_rate(lr):
    print(f"Learning rate set to {lr}")

set_learning_rate(0.01)  # OK
# set_learning_rate(1.5)  # Raises ValueError

3.4 Modules and Packages

Creating a Module:

# File: ml_utils.py
"""
Utility functions for machine learning.
"""

def normalize_data(data, mean=None, std=None):
    """Normalizes data to zero mean and unit variance."""
    if mean is None:
        mean = sum(data) / len(data)
    if std is None:
        std = (sum((x - mean)**2 for x in data) / len(data)) ** 0.5
    
    normalized = [(x - mean) / std for x in data]
    return normalized, mean, std

def split_data(data, labels, ratio=0.8):
    """Splits data into train and test sets."""
    split_idx = int(len(data) * ratio)
    return (
        data[:split_idx],
        labels[:split_idx],
        data[split_idx:],
        labels[split_idx:]
    )

# Module-level variables
DEFAULT_RANDOM_SEED = 42

Using Modules:

# Import entire module
import ml_utils

data, mean, std = ml_utils.normalize_data([1, 2, 3, 4, 5])
print(ml_utils.DEFAULT_RANDOM_SEED)

# Import specific functions
from ml_utils import normalize_data, split_data

data, mean, std = normalize_data([1, 2, 3])

# Import with alias
from ml_utils import normalize_data as norm

data, _, _ = norm([1, 2, 3])

# Import all (not recommended for large modules)
from ml_utils import *

Creating a Package:

ml_project/
│
├── __init__.py           # Makes it a package
├── preprocessing/
│   ├── __init__.py
│   ├── normalization.py
│   └── feature_engineering.py
├── models/
│   ├── __init__.py
│   ├── linear_models.py
│   └── neural_nets.py
└── utils/
    ├── __init__.py
    └── metrics.py

init.py example:

# ml_project/__init__.py
"""
ML Project Package
"""

__version__ = '1.0.0'
__author__ = 'Your Name'

# Import key functions for easy access
from .preprocessing.normalization import normalize
from .models.linear_models import LinearRegression

# Package-level configuration
DEFAULT_CONFIG = {
    'random_seed': 42,
    'test_size': 0.2
}

Using the package:

# Import from package
from ml_project.preprocessing.normalization import normalize
from ml_project.models.linear_models import LinearRegression
from ml_project import DEFAULT_CONFIG

# Or use package-level imports
import ml_project

model = ml_project.LinearRegression()
data = ml_project.normalize(raw_data)

Relative Imports (within package):

# In ml_project/models/neural_nets.py
from ..preprocessing.normalization import normalize  # Go up one level
from .linear_models import LinearRegression          # Same level
from ..utils.metrics import accuracy                 # Up then down

Module 4: Object-Oriented Programming (OOP) for ML

4.1 Classes and Objects

Basic Class:

class NeuralNetwork:
    """A simple neural network class."""
    
    def __init__(self, input_size, hidden_size, output_size):
        """
        Constructor - called when creating new object.
        
        Args:
            input_size: Number of input features
            hidden_size: Number of hidden neurons
            output_size: Number of output classes
        """
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.weights1 = None
        self.weights2 = None
        self.is_trained = False
    
    def initialize_weights(self):
        """Initializes network weights."""
        import random
        self.weights1 = [[random.random() for _ in range(self.hidden_size)]
                         for _ in range(self.input_size)]
        self.weights2 = [[random.random() for _ in range(self.output_size)]
                         for _ in range(self.hidden_size)]
    
    def train(self, X, y, epochs=100):
        """Trains the network."""
        self.initialize_weights()
        for epoch in range(epochs):
            # Training logic
            pass
        self.is_trained = True
        print(f"Training completed for {epochs} epochs")
    
    def predict(self, X):
        """Makes predictions."""
        if not self.is_trained:
            raise ValueError("Model must be trained first!")
        # Prediction logic
        return []
    
    def get_info(self):
        """Returns model information."""
        return {
            'input_size': self.input_size,
            'hidden_size': self.hidden_size,
            'output_size': self.output_size,
            'is_trained': self.is_trained
        }

# Creating objects (instances)
model1 = NeuralNetwork(784, 128, 10)  # MNIST-like architecture
model2 = NeuralNetwork(100, 64, 2)    # Binary classification

# Using methods
model1.train(X_train, y_train, epochs=50)
predictions = model1.predict(X_test)
info = model1.get_info()
print(info)

4.2 Attributes and Methods

class Dataset:
    """Dataset class for managing ML data."""
    
    # Class attributes (shared by all instances)
    supported_formats = ['csv', 'json', 'parquet']
    num_instances = 0
    
    def __init__(self, data, labels, name="Unnamed"):
        # Instance attributes (unique to each object)
        self.data = data
        self.labels = labels
        self.name = name
        self.size = len(data)
        self._preprocessed = False  # Private attribute (convention)
        
        # Increment class counter
        Dataset.num_instances += 1
    
    # Instance method (requires self)
    def shuffle(self):
        """Shuffles data and labels together."""
        import random
        combined = list(zip(self.data, self.labels))
        random.shuffle(combined)
        self.data, self.labels = zip(*combined)
        self.data = list(self.data)
        self.labels = list(self.labels)
    
    def normalize(self):
        """Normalizes the data."""
        mean = sum(sum(row) for row in self.data) / (len(self.data) * len(self.data[0]))
        self.data = [[(x - mean) for x in row] for row in self.data]
        self._preprocessed = True
    
    # Property (accessed like attribute, but is a method)
    @property
    def is_preprocessed(self):
        """Check if data has been preprocessed."""
        return self._preprocessed
    
    # Setter for property
    @is_preprocessed.setter
    def is_preprocessed(self, value):
        if not isinstance(value, bool):
            raise TypeError("Must be boolean")
        self._preprocessed = value
    
    # Class method (works with class, not instance)
    @classmethod
    def from_file(cls, filename):
        """Creates Dataset from file."""
        # Load data from file
        data = []  # Load logic here
        labels = []
        return cls(data, labels, name=filename)
    
    # Static method (doesn't need class or instance)
    @staticmethod
    def is_valid_format(filename):
        """Checks if file format is supported."""
        extension = filename.split('.')[-1]
        return extension in Dataset.supported_formats
    
    # Special method for string representation
    def __repr__(self):
        return f"Dataset(name='{self.name}', size={self.size})"
    
    # Special method for len()
    def __len__(self):
        return self.size
    
    # Special method for indexing
    def __getitem__(self, index):
        return self.data[index], self.labels[index]

# Usage examples
ds = Dataset([[1, 2], [3, 4]], [0, 1], name="MyDataset")

# Instance methods
ds.shuffle()
ds.normalize()

# Property access
print(ds.is_preprocessed)  # True (accessed like attribute)

# Class method
ds2 = Dataset.from_file("data.csv")

# Static method
if Dataset.is_valid_format("data.csv"):
    print("Valid format")

# Special methods
print(ds)              # Uses __repr__
print(len(ds))         # Uses __len__
sample = ds[0]         # Uses __getitem__

# Class attribute
print(f"Total datasets created: {Dataset.num_instances}")

4.3 Inheritance

# Base class
class Model:
    """Base model class."""
    
    def __init__(self, name):
        self.name = name
        self.is_trained = False
        self.training_history = []
    
    def train(self, X, y):
        """To be implemented by subclasses."""
        raise NotImplementedError("Subclass must implement train()")
    
    def predict(self, X):
        """To be implemented by subclasses."""
        raise NotImplementedError("Subclass must implement predict()")
    
    def evaluate(self, X, y):
        """Common evaluation logic."""
        predictions = self.predict(X)
        accuracy = sum([p == l for p, l in zip(predictions, y)]) / len(y)
        return accuracy
    
    def save_model(self, filepath):
        """Common save functionality."""
        print(f"Saving {self.name} to {filepath}")

# Derived class 1
class LinearRegressionModel(Model):
    """Linear regression implementation."""
    
    def __init__(self, name="LinearRegression"):
        super().__init__(name)  # Call parent constructor
        self.coefficients = None
        self.intercept = None
    
    def train(self, X, y, learning_rate=0.01, epochs=1000):
        """Implements training for linear regression."""
        # Initialize parameters
        self.coefficients = [0.0] * len(X[0])
        self.intercept = 0.0
        
        # Training loop
        for epoch in range(epochs):
            # Gradient descent logic
            loss = 0.0
            # ... compute and update
            self.training_history.append(loss)
        
        self.is_trained = True
        print(f"{self.name} training completed")
    
    def predict(self, X):
        """Makes predictions using linear model."""
        if not self.is_trained:
            raise ValueError("Model not trained!")
        
        predictions = []
        for sample in X:
            pred = self.intercept + sum([c * x for c, x in zip(self.coefficients, sample)])
            predictions.append(pred)
        return predictions

# Derived class 2
class DecisionTreeModel(Model):
    """Decision tree implementation."""
    
    def __init__(self, name="DecisionTree", max_depth=10):
        super().__init__(name)
        self.max_depth = max_depth
        self.tree = None
    
    def train(self, X, y):
        """Implements training for decision tree."""
        # Build tree
        self.tree = self._build_tree(X, y, depth=0)
        self.is_trained = True
        print(f"{self.name} training completed")
    
    def _build_tree(self, X, y, depth):
        """Helper method to build tree recursively."""
        if depth >= self.max_depth:
            return {'leaf': True, 'value': max(set(y), key=y.count)}
        # Tree building

Log In

Sign Up