Table of Contents
Master time series forecasting. Complete guide to ARIMA, exponential smoothing, neural networks, and building forecasting systems.
Introduction: Time Series Forecasting
Time series forecasting is one of machine learning’s most important and practical applications.
Stock prices, weather forecasts, demand prediction, anomaly detection—all require understanding temporal patterns.
Yet time series is deceptively challenging. Unlike independent data points, time series has dependencies: today’s value depends on yesterday’s, which depends on the day before. This temporal structure must be captured carefully.
Moreover, time series has unique challenges:
- Non-stationarity (patterns change over time)
- Seasonality (repeating patterns)
- Trend (long-term direction)
- Exogenous variables (external factors)
- Concept drift (past patterns become invalid)
This guide covers the landscape of time series forecasting: from classical statistical methods to modern deep learning, from univariate to multivariate problems, from theory to production systems.
Time Series Fundamentals
What is a Time Series?
Sequence of observations ordered in time.
Examples:
- Stock prices (hourly, daily)
- Temperature (daily average)
- Website traffic (hourly)
- Sales (daily, weekly)
- Power consumption (15-minute intervals)
Key Concepts
Temporal Dependence: Value at time t depends on value at time t-1, t-2, etc.
Observation: Sales on Day 5 likely similar to Day 4
Because: Customers, seasonality, trends persist
Forecast Horizon: How far ahead to predict.
Short-term: 1 day ahead (stock price next hour)
Medium-term: 1-3 months ahead (sales next quarter)
Long-term: 1+ year ahead (climate prediction)
Accuracy decreases with horizon
Forecast Frequency: How often to make predictions.
Real-time: Updated continuously (stock trading)
Daily: Updated once per day (weather)
Weekly: Updated once per week (demand)
Components of Time Series
Trend
Long-term direction, increasing or decreasing.
Examples:
- Stock price trending up over 5 years
- Climate warming long-term
- Website traffic growing month-over-month
Visualization:
Price ↑
| ╱╱╱
| ╱╱╱
|╱╱╱
Time →
Clear upward trend
Seasonality
Repeating pattern over fixed period.
Common Patterns:
- Daily: Temperature, website traffic
- Weekly: Retail sales (weekends different)
- Yearly: Holidays, weather seasons
- Other: Business cycles
Example:
Traffic
| ╱\ ╱\
| ╱ \╱ \
|╱________________
Time →
Repeating weekly pattern
Cyclicity
Repeating but irregular pattern (not fixed frequency).
Example:
Economic cycles (booms and recessions)
Not fixed 2-year pattern, but recurring oscillation
Difference from Seasonality: Fixed frequency vs. irregular
Noise (Irregular Component)
Random fluctuations, unexplained variation.
Example:
Stock price movements on individual news
Weather random variations
Decomposition
Separate into components:
Time Series = Trend + Seasonal + Cyclic + Noise
Example:
Stock price = long-term growth + January effect + economic cycle + daily volatility
Stationarity and Differencing
What is Stationarity?
Series with constant mean, variance, and autocorrelation over time.
Stationary Series:
Price oscillates around constant level
No trend
Variance consistent
Looks "random" but with patterns
Non-Stationary Series:
Price trends upward
Variance increases over time
Mean changes across periods
Why Matters: Many algorithms assume stationarity. Non-stationary series must be transformed.
Testing for Stationarity
Visual Inspection:
- Plot series
- Look for trend, changing variance
- Rough but useful
Augmented Dickey-Fuller (ADF) Test:
- Statistical test
- H₀: Series is non-stationary
- p < 0.05: Reject null, series is stationary
Differencing
Transform non-stationary to stationary.
First Difference:
Diff(t) = Value(t) - Value(t-1)
Example:
Original: [10, 12, 15, 18, 22]
Difference: [2, 3, 3, 4]
Removes trend
Seasonal Differencing:
Diff(t) = Value(t) - Value(t-12) # For monthly seasonality
Removes seasonal pattern
Classical Methods
ARIMA (AutoRegressive Integrated Moving Average)
Most successful traditional approach.
Components:
AR (AutoRegressive):
Value(t) = constant + a₁ × Value(t-1) + a₂ × Value(t-2) + ...
Use past values to predict future.
I (Integrated):
Differencing to make series stationary
MA (Moving Average):
Value(t) = constant + e(t) + b₁ × e(t-1) + b₂ × e(t-2) + ...
Use past errors in prediction.
ARIMA(p,d,q):
- p: Number of AR terms
- d: Differencing order
- q: Number of MA terms
Example:
ARIMA(1,1,1):
- Use 1 past value (AR)
- Difference once (I)
- Use 1 past error (MA)
Process:
- Test for stationarity
- Difference if needed
- Find optimal p, d, q
- Fit model
- Make predictions
Exponential Smoothing (ETS)
Give more weight to recent observations.
Simple:
Forecast = α × Recent_Value + (1-α) × Previous_Forecast
α = smoothing parameter (0 < α < 1)
Higher α = more weight to recent
With Trend (Holt’s): Captures both level and trend
With Seasonality (Holt-Winters): Captures level, trend, and seasonal components
Advantage: Simpler than ARIMA, works well in practice
Machine Learning Approaches
Feature Engineering for Time Series
Lag Features:
For prediction of Day 5:
lag_1 = Day 4
lag_7 = Day -2 (one week prior)
lag_365 = 1 year prior
Captures: Momentum, weekly pattern, yearly seasonality
Rolling Statistics:
rolling_mean_7 = average of last 7 days
rolling_std_7 = volatility of last 7 days
rolling_max_7 = maximum of last 7 days
Captures: Trend, volatility
Time-Based Features:
hour = hour of day (0-23)
day_of_week = 0-6
month = 1-12
is_weekend = 0 or 1
Captures: Time-of-day patterns, weekly patterns, seasonality
Machine Learning Models
Decision Trees / Random Forests:
- Can capture non-linear patterns
- Don’t assume any specific distribution
- Can overfit (need regularization)
Gradient Boosting (XGBoost, LightGBM):
- Often excellent performance
- Careful feature engineering needed
- Good baseline to beat
Linear Regression:
- Simple, interpretable
- Assumes linear relationship
- Works surprisingly well often
Deep Learning for Time Series
Recurrent Neural Networks (RNNs)
Process sequences one step at a time, maintaining hidden state.
LSTM (Long Short-Term Memory):
Input: [t-7, t-6, ..., t-1] # Past 7 days
Output: [t, t+1, ..., t+n] # Next n days
LSTM remembers important patterns
Processes entire sequence
Generates predictions
Advantages:
- Can capture complex patterns
- Handles variable-length sequences
- Learns what to remember/forget
Disadvantages:
- Sequential processing (slow)
- Requires lots of data
- Hard to interpret
Attention Mechanisms
Allow model to focus on relevant parts of sequence.
When predicting Day 5:
Pay attention to: Last day (momentum)
Also attend to: Day -2 (weekly pattern)
Ignore: Random daily fluctuations
Transformer Models:
- Parallel processing (faster than RNN)
- Strong performance
- Attention shows what model focuses on
Sequence-to-Sequence (Seq2Seq)
Encoder-decoder architecture.
Process:
Encoder: Process past values → compressed representation
Decoder: Generate future values from representation
Can generate multiple steps ahead
Flexible architecture
Handling Seasonality and Trends
Seasonal Decomposition
Separate series into components.
# Python example
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(series, period=12)
trend = result.trend
seasonal = result.seasonal
residual = result.resid
Use: Understand components, forecast each separately.
Seasonal-Naive Baseline
Simple but effective baseline.
Forecast = Value from same season last period
Example (monthly data):
Forecast January 2025 = January 2024 value
Uses only seasonal pattern
Good baseline: Beat this with any model.
Detrending
Remove trend before modeling.
1. Compute trend (moving average)
2. Subtract trend from series
3. Model detrended series
4. Add trend back to prediction
Multivariate Time Series
Multiple Input Variables
Predict one variable using others.
Example (Sales Forecasting):
Predict: Sales
Using: Price, advertising spend, competitor price, day of week, seasonality
Approach:
- Include all variables in features
- Models learn relationships
- Can capture interactions
Vector AutoRegression (VAR)
Like ARIMA but for multiple series.
Sales(t) = f(Sales(t-1), Price(t-1), Ads(t-1), ...)
Price(t) = f(Sales(t-1), Price(t-1), Ads(t-1), ...)
Advantage: Model dependencies between series.
Evaluation Metrics
Accuracy Metrics
MAE (Mean Absolute Error):
MAE = average(|prediction - actual|)
Units: Same as data
Interpretation: Average error magnitude
RMSE (Root Mean Square Error):
RMSE = √(average((prediction - actual)²))
Penalizes large errors more than MAE
MAPE (Mean Absolute Percentage Error):
MAPE = average(|prediction - actual| / |actual|)
Percentage error
Scale-independent
Directional Metrics
Direction Accuracy:
Did prediction go up when actual went up?
Did prediction go down when actual went down?
Percentage correct: 0-100%
Useful for: Trading, decision-making (not just magnitude).
Benchmarking
Naive Forecasts:
- Last value (tomorrow = today)
- Seasonal naive (tomorrow = year ago)
- Drift (extrapolate trend)
Good model beats these.
Production Considerations
Retraining
Models degrade as data changes.
Strategy:
- Daily retraining (most common)
- Weekly retraining (if slower change)
- Triggered retraining (when accuracy drops)
Be Careful: Retraining costs, can destabilize if not careful.
Handling Outliers
Unusual events break forecasts.
Examples:
- Stock market crash
- Holiday shutdown
- Pandemic disruption
- System outage
Strategies:
- Detect and handle separately
- Use robust methods (less sensitive to outliers)
- Manual intervention
- Model uncertainty (wider confidence intervals)
Uncertainty Quantification
Not just point forecast, also confidence interval.
Why: Better decision-making, risk management.
Methods:
- Quantile regression (forecast percentiles)
- Bootstrap (resample residuals)
- Bayesian (posterior distributions)
Key Takeaways
✓ Time series has temporal dependence – Today depends on yesterday
✓ Stationarity matters – Transform if needed
✓ Components exist: Trend, seasonal, cyclic, noise – Decompose if possible
✓ Classical methods work well – ARIMA, exponential smoothing still competitive
✓ Feature engineering critical – Lags, rolling stats, time features
✓ Deep learning powerful – LSTM, attention, seq2seq for complex patterns
✓ Seasonality important – Often easy to capture, big impact
✓ Evaluation has nuances – Multiple metrics, directional accuracy
✓ Production is hard – Retraining, outliers, uncertainty
✓ No silver bullet – Try multiple approaches, compare
Frequently Asked Questions
Q: Should I use ARIMA or machine learning?
A: Try both. ARIMA for stable patterns. ML for complex relationships. Ensemble both if possible.
Q: How much history do I need?
A: 2-3 years minimum for seasonality. More data better. ML models need more than ARIMA.
Q: How do I handle missing data?
A: Interpolate (fill forward, linear), remove (if few), model explicitly. Choose based on pattern.
Q: Can I predict stock prices?
A: Not consistently. Markets efficient, past doesn’t predict future. Use for other time series.
Q: How do I know confidence interval is right?
A: Check: 95% CI should contain actual value ~95% of time. Validate on test set.

