NHiTS: Interpretable Deep Learning TS Forecasting
A Bridge Between Modern Neural Networks and Traditional Machine Learning algorithms
In the rapidly evolving field of machine learning, neural networks have revolutionized our ability to tackle complex problems. As problems become more complex, so do our models which generally negatively impacts transparency and interpretability. Personally, I have a particular interest in models that try to balance both complexity and interpretability which brings me to NHiTS. Neural Hierarchical Interpolation for Time Series stands out as a powerful tool for time series forecasting. What makes the model particularly intriguing is that its interpretability bridges the gap between modern deep learning and traditional machine learning algorithms. Even though not quite SOTA as of 2024, I think it’s still worth diving into the beauty of this model and explore its connections to the well-established concepts of basis expansion and residual learning from more traditional ML.
About NBEATS
Before NHiTS, there was NBEATS (Neural Basis Expansion Analysis for Time Series), a pioneering neural network architecture specifically designed for time series forecasting. It introduced the concept of using backward and forward residual links to learn time series representations without relying on external feature engineering or domain knowledge. The architecture is based on a stack of fully connected layers arranged in blocks, each responsible for capturing different aspects of the time series data. NBEATS demonstrated remarkable accuracy and interpretability, setting the stage for further advancements in the field. NHiTS builds on these strengths by introducing hierarchical decomposition, enhancing the model’s ability to handle more complex temporal patterns through its multi-resolution approach. Although there are situations where NBEATS might still perform better than NHiTS (also here there is no free lunch), NHiTS tends to outperform NBEATS by 25% on average.
Basis Expansion
One of the most crucial aspects of these models is their use of basis expansion, a concept we know from traditional ML. In polynomial regression for example, we transform the original features into a higher-dimensional space using polynomial functions, enabling the model to capture non-linear relationships.
where hm is the m-th transformation of 𝑋
Some widely used basis functions are:
ℎ𝑚(𝑋) = 𝑋𝑚, which recovers the original linear model.
ℎ𝑚(𝑋) = 𝑋²j or ℎ𝑚(𝑋) = 𝑋𝑗𝑋𝑘, which allow augmenting the inputs with polynomial terms to achieve higher-order Taylor expansions.
ℎ𝑚(𝑋) = 𝑙𝑜𝑔(𝑋𝑗), 𝑠𝑞𝑟𝑡(𝑋𝑗) and others, which allow for other nonlinear transformations.
ℎ𝑚(𝑋) = 𝐼(𝐿𝑚 < 𝑋𝑘 < 𝑈𝑚), which is an indicator for the region of 𝑋𝑘. By breaking the range of 𝑋𝑘 into a set of non-overlapping regions we obtain a model with piecewise-linear contributions of 𝑋𝑘.
Although in the general implementation of NBEATS/NHiTS, there is no limitation imposed on basis expansion, the interpretable time series version uses polynomial expansions to capture the trend, and Fourier transforms for seasonality:
Hierarchical Decomposition
In NHiTS, the time series is decomposed into multiple hierarchical components. These components are processed by fully connected (FC) layers, which learn to represent and predict specific aspects of the time series. This decomposition allows it to perform remarkably well in the context of long horizon forecasting, while also greatly reducing computational complexity. This is achieved by the MaxPool layer in the blocks, allowing for multi-rate signal sampling.
Residual Learning
Another parallel can be drawn with the concept of residual learning in gradient boosting algorithms like XGBoost. In gradient boosting, new models are sequentially added to correct the residuals (errors) of the previous models, iteratively refining the overall prediction.
NBEAT and NHiTS models employ a similar strategy through its block structure. Each block in the model is designed to learn and correct the residuals from the previous block. By focusing on the residuals, each subsequent block incrementally improves the model’s performance. This process of iterative refinement mirrors the additive model in gradient boosting, where each new model is added to improve upon the errors of the existing ensemble.
This ensures that the final forecast is a finely tuned combination of all the hierarchical components.
The Beauty of Interpretability
Besides resulting in a very powerful forecasting model, its interpretability is invaluable for understanding the underlying dynamics of the data and making informed decisions based on the model’s predictions, but also bridges the gap between modern neural networks and traditional machine learning algorithms. For me, this blend of a deep learning architecture with familiar machine learning principles is a robust and insightful approach to time series forecasting.
- “Non-linear regression: basis expansion, polynomials & splines.” Towards Data Science. Available at: https://towardsdatascience.com/non-linear-regression-basis-expansion-polynomials-splines-53bd2e8cca57
- “N-BEATS: Neural Basis Expansion Analysis Time Series Forecasting.” arxiv.org. Available at: https://arxiv.org/abs/1905.10437
- “N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting.” arxiv.org. Available at: https://arxiv.org/pdf/2201.12886v2