Time Series Forecasting - DeepAR
Introduction to DeepAR Time Series Forecasting (Lesson 155)
- DeepAR is Amazon SageMaker’s built-in time series forecasting algorithm that uses recurrent neural networks (RNNs).
Components of Time Series
- Noise: Random fluctuations that can’t be predicted.
- Trend: Long-term increase or decrease in the target value.
- Seasonality: Patterns that repeat at specific intervals.
Benefits of Using DeepAR
- Handling Non-Stationarity: Unlike ARIMA, DeepAR doesn’t require removing seasonal and trend components, simplifying the process.
- Multiple Time Series: Capable of training on multiple related time series, useful for diverse applications like utility usage forecasting.
- Feature Inclusion: Supports flags for events like Black Friday or Diwali to improve forecasting accuracy.
Training Data Division
- Other Algorithms: Random division into training and test sets.
- DeepAR: Requires time-ordered data; cannot randomly divide.
Prediction Length Hyperparameter
- DeepAR Specific: Defines how far in the future the model forecasts.
- Immutability: Once set, the prediction length cannot be changed after training.
Training and Test Set Splitting
- Training Set: Comprises the entire time series except the last ‘prediction length’ number of points.
- Test Set: Includes the entire time series, encompassing the last ‘prediction length’ points.
- Formats Supported: JSON Lines, Parquet and optional compression of training files.
- Inference Input: Requires JSON format.
Structure of Training File
- start: Timestamp for the time series.
- target: Array of floating or integer values.
- dynamic_feat: Optional input featues, array of arrays for each feature.
- cat: Optional category, identifies a specific time series.
Challenges with Dynamic Features
- No Missing Values Allowed: Requires handling missing values for dynamic features.
- Future Prediction: Requires future values of dynamic features for forecasting.
- Similar to Training: Wrapped in an ‘instances’ field.
- Necessity of Complete Time Series: Includes historical data, dynamic features, and categories for prediction.
- Length Consideration: Number of values in dynamic features should equal the length of the target plus prediction length.