Time Series Analysis

Time Series Analysis is a specialized branch of data science that deals with data points indexed in time order. Unlike standard regression where we assume observations are independent, in Time Series, the order of data matters because today’s value usually depends on yesterday’s.

1. Time Series Components

To analyze a time series, we first decompose it into four fundamental parts:

Trend: The long-term increase or decrease in the data (e.g., the global temperature rising over decades).
Seasonality: Patterns that repeat at fixed intervals (e.g., ice cream sales spiking every summer).
Cyclicity: Fluctuations that occur without a fixed period, often related to economic cycles (e.g., a recession happening every 7–10 years).
Noise (Residual): Random variations that cannot be explained by the trend or seasonality.

2. Stationarity

Most statistical forecasting models (like ARIMA) require the data to be Stationary.

What is it? A stationary time series is one whose statistical properties (Mean, Variance, and Autocorrelation) do not change over time. It has no trend and constant variance.
Why it matters: Models are easier to fit when the “rules” of the data don’t change.
How to achieve it: * Differencing: Subtracting the previous value from the current value ($y_t – y_{t-1}$) to remove trend.
- Log Transform: To stabilize variance.
How to test it: The Augmented Dickey-Fuller (ADF) Test. If the p-value is $< 0.05$, the data is stationary.

3. ARIMA (AutoRegressive Integrated Moving Average)

ARIMA is the “Gold Standard” for non-seasonal time series forecasting. It combines three parts:

AR (AutoRegression – $p$): Uses the relationship between an observation and a number of lagged observations (past values).
I (Integrated – $d$): The number of times the raw observations are differenced to make the data stationary.
MA (Moving Average – $q$): Uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

Notation: $ARIMA(p, d, q)$. For example, $ARIMA(1, 1, 1)$ means we use 1 lag, 1 difference, and 1 error term.

4. SARIMA (Seasonal ARIMA)

SARIMA is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component.

How it works: It adds four new seasonal parameters to the ARIMA model: $(P, D, Q)m$.
The $m$ parameter: Represents the number of observations per seasonal cycle (e.g., $m=12$ for monthly data with yearly seasonality).
Use Case: Predicting monthly electricity consumption or retail sales where there is a clear “holiday” or “summer” effect.

5. Forecasting Techniques

Beyond ARIMA/SARIMA, there are several modern ways to predict the future:

Exponential Smoothing (ETS)

Assigns exponentially decreasing weights to past observations. The most recent data is weighted more heavily than older data.

Simple: For data with no trend/seasonality.
Holt’s: For data with a trend.
Holt-Winters: For data with both trend and seasonality.

Prophet (by Meta)

An open-source tool designed for business forecasting.

Pros: It handles missing data, outliers, and dramatic changes in trend (like a product launch) very well. It also automatically handles “Holidays.”

Deep Learning (LSTM & GRU)

Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) specifically designed to “remember” long-term dependencies in sequences.

Use Case: Highly complex, non-linear time series like stock market high-frequency trading or speech recognition.

6. Real-World Evaluation: AIC & BIC

When choosing between multiple time series models, we use:

AIC (Akaike Information Criterion): Rewards model accuracy but penalizes “complexity” (having too many parameters).
Lower is better. You want the model that explains the data with the simplest possible math.

Summary Comparison

Technique	Best For	Complexity
Naive Forecast	Stable data with no change.	Very Low
ARIMA	Data with a trend but no seasonality.	Medium
SARIMA	Data with strong seasonal patterns.	High
Prophet	Business data with holidays/missing points.	Low (Automated)
LSTM	Massive datasets with non-linear patterns.	Very High

Log In

Sign Up