library(tidyverse)
library(forecast)
library(fma)
library(magrittr)
library(fpp3)
library(tsibble)
library(lubridate)
library(feasts)
Simple Forecasting Methods
Some methods are extremely simple yet surprisingly effective. The following methods can be considered the benchmark against which other models will be tested.
Average Method
Here, the forecasts of all future values are equal to the average (mean) of the historical data.
\[ \hat{y}_{T + h|T} = \bar{y} = (y_1 + \ldots + y_T)/T \]
bricks <-
aus_production %>%
filter(between(year(Quarter), 1970, 2004))
bricks %>%
model(MEAN(Bricks)) %>%
forecast() %>%
autoplot(bricks, PI = FALSE)
## Warning: Ignoring unknown parameters: PI
## Warning: Ignoring unknown parameters: PI
## Warning: Ignoring unknown parameters: PI
## Warning: Ignoring unknown parameters: PI
Naive Method
For naive forecasts, we simply set all forecasts to be the value of the last observation:
\[ \bar{y}_{T + h|T} = y_T \]
Because a naive forecast is optimal when data follow a random walk, these are also called random walk forecasts.
Seasonal Naive Method
A similar method is useful for highly seasonal data. Each forecast is set to bethe last observed value from the same season of the year.
\[ \bar{y}_{T + h|T} = y_{T + h - m(k+1)} \]
Where \(m\) is the seasonal period, and \(k\) is the integer part of \((h - 1)/m\), i.e. the number of complete years in the forecast period prior to time \(T + h\).
e.g. for monthly data, the forecast for all future February values is equal to the last observed February value.
bricks %>%
model(snaive = SNAIVE(Bricks ~ lag('year'))) %>%
forecast(h = '5 years') %>%
autoplot(bricks)
Drift Method
A variation on the naive method is to allow the forecasts to increase or decrease over time, where the amount of change over time is called drift. This is set to the average change in the historical data.
\[ \bar{y}_{T + h|T} = y_T + \frac{h}{T-1} \sum_{t=2}^T (y_y - y_{t-1} ) \\ = y_T + h \bigg( \frac{ y_T - y_1 }{ T - 1 } \bigg) \]
This is equivalent between drawing a line between the first and last observations and extrapolating this into the future.
Examples
Applying the first three methods to the quarterly beer production
ausbeer_training <-
aus_production %>%
filter_index("1992 Q1" ~ "2006 Q4")
beer_fc <-
ausbeer_training %>%
model(
mean = MEAN(Beer),
naive = NAIVE(Beer),
seasonal_naive = SNAIVE(Beer)
) %>%
forecast(h = 14)
beer_fc %>%
autoplot(ausbeer_training, level = NULL) +
autolayer(
filter_index(aus_production, "2007 Q1" ~ .),
Beer,
colour = 'black'
) +
labs(
x = 'Year',
y = 'Megalitres',
title = 'Forecasts for Quarterly Beer Production',
colour = 'Forecast Method'
)
Non-seasonal methods applied to 200 days of Google stock prices:
google_stock <-
gafa_stock %>%
filter(Symbol == 'GOOG') %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE)
google_2015 <-
google_stock %>%
filter(year(Date) == 2015)
google_fit <-
google_2015 %>%
model(
mean = MEAN(Close),
naive = NAIVE(Close),
drift = NAIVE(Close ~ drift())
)
google_jan_2016 <-
google_stock %>%
filter(yearmonth(Date) == yearmonth('2016 Jan'))
# Use the filtered data as the forecast horizon, rather than
# using h = <forecast>
google_fit %>%
forecast(google_jan_2016) %>%
autoplot(google_2015, level = NULL) +
autolayer(google_jan_2016, Close, colour = 'black') +
labs(
x = 'Day',
y = 'Closing Price (USD)',
title = 'Google Stock',
subtitle = 'Closing Price Forecasts'
) +
guides(
colour = guide_legend(title = 'Forecast')
)
Sometimes these methods are the best forecasting method available, but mainly they are used as a benchmark - i.e. if a more complex forecasting method can’t perform better than these, it shouldn’t be considered.
Fitted Values and Residuals
Each observation can be forecast using all previous observations. These are called fitted values and are denoted by \(\hat{y}_{t|t-1}\), or simply \(\hat{y}_t\). They always involve one-step forecasts.
They are often not true forecasts as they are estimated using all available observations, including those in the future. For example in the average method, \(y_t = \hat{c}\) where \(\hat{c}\) is the average of all available observations, including those after \(t\).
Residuals
Residuals in a time series model are what is ‘left over’ after fitting a model. For many (but not all) models, the residuals are \(e_t = y_t - \hat{y}_t\).
The augment()
function can be used to calculate the residuals:
## # A tsibble: 756 x 6 [1]
## # Key: Symbol, .model [3]
## Symbol .model day Close .fitted .resid
## <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 GOOG mean 253 522. 602. -79.6
## 2 GOOG mean 254 511. 602. -90.5
## 3 GOOG mean 255 499. 602. -102.
## 4 GOOG mean 256 498. 602. -103.
## 5 GOOG mean 257 500. 602. -102.
## 6 GOOG mean 258 493. 602. -108.
## 7 GOOG mean 259 490. 602. -112.
## 8 GOOG mean 260 493. 602. -108.
## 9 GOOG mean 261 498. 602. -103.
## 10 GOOG mean 262 499. 602. -103.
## # … with 746 more rows
Residual Diagnostics
A good forecasting method with yielf residuals with the following properties:
- The residuals are uncorrelated.
- The residuals have a zero mean.
Adjusting the bias is easy - if the residuals have mean \(m\), add \(m\) to all of the forecasts. Fixing correlation is harder.
It is useful (but not necessary) that the residuals have:
- Constant variance
- Normally distributed
These two make the calculation of prediction intervals easier.
Example
For stock market prices and indexes, the naive methods are often the best. Each forecast is equal to the last observed value, so then the residuals are equal to the difference between consecutive observatiosn:
\[ e_t = y_t - \hat{y_t} = y_t - y_{t-1} \]
Let’s take a look at the Google daily closing stock price:
google_2015 %>%
autoplot(Close) +
labs(
x = 'Day',
y = 'Closing Price (USD)',
title = 'Google Closing Stock Price (2015)'
)
Let’s take a look at the residuals from forecasting using the naive method:
google_2015 %>%
model(naive = NAIVE(Close)) %>%
augment() %>%
autoplot(.resid) +
labs(
x = 'Day',
y = 'Naive Residual (USD)',
title = 'Google Stock Price 2015',
subtitle = 'Naive Model Residuals'
)
## Warning: Removed 1 row(s) containing missing values (geom_path).
google_2015 %>%
model(naive = NAIVE(Close)) %>%
augment() %>%
ggplot() +
geom_histogram(aes(.resid), binwidth = 2) +
labs(
x = 'Residual',
y = 'Count',
title = 'Google Stock Price (2015)',
subtitle = 'Naive Model - Residual Histogram'
)
## Warning: Removed 1 rows containing non-finite values (stat_bin).
We see the right tail is a little too long for a normal distribution.
google_2015 %>%
model(naive = NAIVE(Close)) %>%
augment() %>%
ACF(.resid) %>%
autoplot() +
labs(
x = 'Lag',
y = 'Autocorrelation Coefficient',
title = 'Google Naive Model - Residual ACF'
)
The graphs show that the naive method appears to account for all information. The mean of the residuals is close to zero and there is no significant correlation between residuals as shown in the ACF plot. The time plot shows the residual variation is resonably constant, with the exception of one outlier.
The gg_tresiduals()
is a convenient shortcut:
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
Portmanteau Test for Autocorrelation
In addition to the ACF plot, there is a more formal test for autocorrelation. The whole set of \(r_k\) values is treated as a group, rather than individually. Remembering that \(r_k\) is the autocorrelation for lag \(k\).
The ACF plot is essentially a multiple hypothesis test, and there is a probability of a false positive. With enough tests it is probable we will get one false positive, concluding there is some autocorrelation.
To overcome this, the first \(\ell\) autocorrelations are tested to see whether they are significantly different from what would be expected from a white noise process.
One such test is the Box-Pierce test:
\[ Q = T \sum_{k=1}^\ell r^2_k \]
where \(\el\) is the maximum lag being considered and \(T\) is the number of observations. If each \(r_k\) is close to zero, \(Q\) will be small. If some \(r_k\) values are large, then \(Q\) will be large. A suggestion is to use \(h = 10\) for non-seasonal data and \(h = 2m\) for seasonal data, where \(m\) is the period. However the test is not good when \(h\) is large, so if values are larger than \(T/5\), use \(h = T/5\).
A related a more accurate test is the Ljung-Box test:
\[ Q^* - T(T + 2) \sum_{k=1}^\ell (T - k)^{-1} r^2_k \]
Again, large values of \(Q^*\) suggest the autocorrelations do not come from a white noise series.
How large is too large? If the autocorrelations did come from a white noise series, then both \(Q\) and \(Q^*\) would have a \(\chi^2\) distribution with \((h - K)\) degrees of freedom, where \(K\) is the number of parameters in the model. If calculated fom raw data (rather than residuals), then \(K = 0\).
For the Google stocket price, the naive model has no parameters, so \(K = 0\) also.
google_2015 %>%
model(naive = NAIVE(Close)) %>%
augment() %>%
features(.resid, ljung_box, lag = 10, dof = 0)
# A tibble: 1 x 4
Symbol .model lb_stat lb_pvalue
<chr> <chr> <dbl> <dbl>
1 GOOG naive 7.91 0.637
For this test, the results are not significant, so we would say that the series is not distinguishable from white noise.
An alternative approach could be the drift method:
## # A tibble: 1 x 7
## Symbol .model term estimate std.error statistic p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 GOOG naive b 0.944 0.705 1.34 0.182
The estimated parameter is the drift coefficient which measures the average daily change observed in the historical data.
Applying the Ljung-Box test:
google_2015 %>%
model(naive = RW(Close ~ drift())) %>%
augment() %>%
features(.resid, ljung_box, lag = 10, dof = 1)
## # A tibble: 1 x 4
## Symbol .model lb_stat lb_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 GOOG naive 7.91 0.543
Again,the p-value is high, so this is indistinguishable from white noise.
Forecast Distirbutions
Uncertainty in the forecasts is expressed using a probablity distribution. The point forecast is the mean of this distribution. Most time-series forecasts produce normally-distributed forecasts.
Prediction Intervals
A prediction interval is the interval within which we expect \(y_t\) to lie with a specified probability.
Assuming a normal, the prediction 95% prediction interval for the \(h\)-step forecast is:
\[ y_{t+h|T} \pm 1.96\hat{\sigma}_h \] Where \(\hat{\sigma}_h\) is the standard deviation of the \(h\) step forecast.
It can also be expressed as \(c\hat{\sigma}_h\), where \(c\) is the percentage.
[1] 0.6744898
[1] 1.281552
[1] 1.959964
One-Step Prediction Intervals
When forecasting one-step ahead, the standard deviation of the forecast distribution is almost the same as the standard deviation of the residuals.
[1] 758.88
# Standard deviation of the residuals
google_2015 %>%
model(naive = NAIVE(Close)) %>%
augment() %>%
pull(.resid) %>%
sd(na.rm = TRUE)
[1] 11.17197
[1] 744.5651 773.1949
Multi-Step Prediction Intervals
Prediction intervals generally increase as the forecast horizon increases, so \(\sigma_h\) increases with \(h\).
Benchmark Methods
Here are the formulas for the benchmark methods, where \(\hat{\sigma}_h\) is the residual standard deviation.
- Mean: \(\hat{\sigma}_h = \hat{\sigma} \sqrt{1 + 1/T}\)
- Naive: \(\hat{\sigma}_h = \hat{\sigma} \sqrt{h}\)
- Seasonal Naive: \(\hat{\sigma}_h = \hat{\sigma} \sqrt{k + 1}\)
- \(k\) is the integer part of \((h - 1)/m\)
- \(m\) is the seasonal period
- Drift: \(\hat{\sigma}_h = \hat{\sigma} \sqrt{h(1 + h/t)}\)
When \(h = 1\) and \(T\) is large, these all give the same approximate value for \(\hat{\sigma}\).
Prediction intervals can be computed using the fable
package.
## # A tsibble: 12 x 6 [1Q]
## # Key: .model [1]
## .model Quarter Beer .mean `80%`
## <chr> <qtr> <dist> <dbl> <hilo>
## 1 naive 2010 Q3 N(374, 4580) 374 [287.27351, 460.7265]80
## 2 naive 2010 Q4 N(374, 9159) 374 [251.35022, 496.6498]80
## 3 naive 2011 Q1 N(374, 13739) 374 [223.78531, 524.2147]80
## 4 naive 2011 Q2 N(374, 18319) 374 [200.54702, 547.4530]80
## 5 naive 2011 Q3 N(374, 22898) 374 [180.07367, 567.9263]80
## 6 naive 2011 Q4 N(374, 27478) 374 [161.56435, 586.4357]80
## 7 naive 2012 Q1 N(374, 32057) 374 [144.54327, 603.4567]80
## 8 naive 2012 Q2 N(374, 36637) 374 [128.70044, 619.2996]80
## 9 naive 2012 Q3 N(374, 41217) 374 [113.82052, 634.1795]80
## 10 naive 2012 Q4 N(374, 45796) 374 [ 99.74675, 648.2532]80
## 11 naive 2013 Q1 N(374, 50376) 374 [ 86.36077, 661.6392]80
## 12 naive 2013 Q2 N(374, 54956) 374 [ 73.57062, 674.4294]80
## # … with 1 more variable: `95%` <hilo>
aus_production %>%
model(naive = NAIVE(Beer)) %>%
forecast(h = "3 years") %>%
autoplot(filter_index(aus_production, '2000' ~ .))
Prediction Intervals from Bootstrapped Residuals
When a normal distribution is an unreasonable assumption, an alternative is to bootstrap the residuals. The only assumption then is that the residuals are uncorrelated.
A one step forecast is defined as \(e_t = y_t - \hat{y}_{t|t-1}\), which can be rewritten as \(y_t = \hat{y}_{t|t-1} + e_t\).
The next observation can be written as \(y_{T+1} = \hat{y}_{T+1|T} + e_{T+1}\), which is a one-step forecast plus a future error. If we assume that future errors will be similar to past errors, the \(e_{T+1}\) can be replaced by sampling from errors (residuals) we’ve seen in the past. This can then be done in the same way for \(T+2, \ldots\).
The generate()
function can be used for this:
google_2015 %>% {
model(., naive = NAIVE(Close)) %>%
generate(h = 30, times = 5, bootstrap = TRUE) %>%
ggplot() +
geom_line(aes(day, .sim, colour = .rep)) +
geom_line(data = ., aes(day, Close)) +
labs(
x = 'Day',
y = 'Closing Price (USD)',
title = 'Google Stock Price',
subtitle = 'Naive Forecast - Bootstrapped'
)
}
Prediction intervals can be computed by calculating percentiles of the future sample paths for each forecast horizon. The result is a bootstrapped prediction interval. This is done within the forecast()
function.
google_2015 %>%
model(NAIVE(Close)) %>%
forecast(h = 30, bootstrap = TRUE) %>%
autoplot(google_2015) +
labs(
title = 'Google Stock Closing Price',
subtitle = 'Naive Forecast with Bootstrapping'
)
THe number of samples can be controlled using the times
argument for forecast()
.
Forecasting Using Transformations
When forecasting a model with a transformation, the tranformation needs to be reversed (back-transformed) to obtain forecasts on the original scale.
The fable
package will automatically back-transform the forecasts whenever a transformation has been used in the model definition.
Prediction Intervals with Transformations
If transformation have been used, then the prediction interval is computed on the transformed scale and back-transformed. This preserves the probabilitiy coverage, however it won’t be symmetric around the point forecast.
Forecasting with Constraints.
Transformations can be used to ensure forecasts remain on the appropriate scale; for example, log transformations constrain the forecasts to stay positive, or the logit (logistic) which constrains the forecasts between a specific interval.
\[ f(x) = log \bigg( \frac{ x - a }{ b - x } \bigg) \] Inverting this transformation:
\[ f^{-1}(x) = \frac{ a | be^x }{ 1 + e^x } = \frac{ (b - a)e^x }{ 1 + e^x } + a \]
The new_transformation()
function can be used to use a transformation iwth a model. It’s of the form:
logistic <- new_transformation(
transformation = function(x, lower = 0, upper = 1) {
log((x - lower)/(upper - x))
},
inverse = function(x, lower = 0, upper = 1) {
(upper - lower)*exp(x) / (1 + exp(x)) +lower
}
)
You could then use logistic(y, 0, 100)
on the left hand side of the formula.
Bias Adjustments
An issue with transformation, such as the Box-Cox, is that the back-transformed point forecast will not be the mean of the forecast distribution, it will usually be the median.
This can be acceptable, but you may want the mean. For example, adding up sales forecasts from various regions to form a forecast for the entire country. Means add up, but medians don’t.
The reverse Box-Cox:
$$ y_t = \begin{cases} exp(w_t) & = 0; \ (w_t + 1)^{1/} &
\end{cases} $$
The difference between the simple back transformed forecast and the above is called the bias. When using it within a model, we say the point forecasts have been bias-adjusted.
We can see how much difference the bias-adustment makes:
eggs %>%
as_tsibble() %>%
model(rw = RW(log(value) ~ drift())) %>%
forecast(h = 50) %>%
autoplot(eggs, point_forecast = lst(mean, median)) +
labs(
x = 'Year',
y = 'Price (constant dollars)',
title = 'Cost of Eggs in the US'
)
## Warning: Ignoring unknown aesthetics: linetype
Bias adjusted forecast means are automatically computed in the fable
package when using mean()
on a distribution. The median (point forecast prior to bias adjustment) can be obtained using the median()
function.
Forecasting with Decomposition
Assuming an additive decomposition, the decomposed time series can be written as:
\[ \hat{y}_t = \hat{S}_t + \hat{A}_t \] where \(\hat{A}_t = \hat{T}_t + \hat{R}_t\) is the seasonally adjusted component.
To forecast a decomposed series, we forecast the seasonal component _t, and the seasonally adjusted component _t separately. The assumption is that the seasonal component is unchanging (or changing very slowly). Thus it is forecasted by taking the last year of the estimated component - the seasonal-naive method.
For the seasonally adjusted component, any non-seasonal forecasting method can be used:
- Random walk with drift
- Holt’s method
- Non-season ARIMA
Example
In this example we forecast employment in the US retail sector:
us_employ_comp <-
us_employment %>%
filter_index('1990' ~ .) %>%
filter(Title == 'Retail Trade') %>%
model(STL = STL(Employed ~ trend(window = 7), robust = TRUE)) %>%
components() %>%
select(-.model)
us_employ_comp %>%
model(NAIVE(season_adjust)) %>%
forecast() %>%
autoplot(us_employ_comp) +
labs(
x = 'Month',
y = 'New Orders Index',
title = 'US Retail Trade',
subtitle = 'Naive Forecast of Seasonally Adjusted Data'
)
This shows naive forecasts of the seasonally adjusted data.
These are then ‘reseasonalisd’ by adding in the seasonal naive forecasts of the seasonal component. The decomposition_model()
function makes this easier. It allows for the computing of forecasts via any additive decomposition. Seasonal components will be forecasted automatically using SNAIVE()
if no model is specified.
us_retail <-
us_employment %>%
filter_index('1990' ~ .) %>%
filter(Title == 'Retail Trade')
us_decomp_fit <-
us_retail %>%
model(stlf = decomposition_model(
STL(Employed ~ trend(window = 7), robust = TRUE),
NAIVE(season_adjust)
))
us_decomp_fit %>%
forecast() %>%
autoplot(us_retail)
The upper and lower bounds of the prediction intervals are ‘reseasonalised’ by adding in the forecasts of the seasonal component.
The ACF shows significant autocorrelations due to the naive method not capturing changing trend in the seasonally adjusted series:
## Warning: Residuals of type `innovation` are not supported for STL decomposition model models.
## Defaulting to `type="response"`
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_bin).
As we move along in this study, more suitable methods for forecasting the seasonally adjusted component are explained rather than the naive method.
Evaluating Forecast Accuracy
Training and Test Sets
It’s common to split data into training and test sets, with the test set being around 20% of the data.
Some notes:
- A model which fits the training data well will not necessarily forecast well.
- A perfect fit can always be obtained by using a model with enough parameters.
- Over fitting a model to data is just as bad as failing to identify a systematic pattern in the data.
Function to Subset Time Series
The filter()
function can be used:
# A tsibble: 3 x 7 [1Q]
Quarter Beer Tobacco Bricks Cement Electricity Gas
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1995 Q1 426 4714 430 1626 41768 131
2 1995 Q2 408 3939 457 1703 43686 167
3 1995 Q3 416 6137 417 1733 46022 181
# A tsibble: 3 x 7 [1Q]
Quarter Beer Tobacco Bricks Cement Electricity Gas
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1956 Q1 284 5225 189 465 3923 5
2 1957 Q1 262 5577 187 529 4339 5
3 1958 Q1 272 5758 199 554 4608 5
# A tsibble: 4 x 7 [1Q]
Quarter Beer Tobacco Bricks Cement Electricity Gas
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2009 Q3 419 NA NA 2325 58394 252
2 2009 Q4 488 NA NA 2273 57336 210
3 2010 Q1 414 NA NA 1904 58309 205
4 2010 Q2 374 NA NA 2401 58041 236
# A tsibble: 64,532 x 5 [1M]
# Key: State, Industry [152]
# Groups: State, Industry [152]
State Industry `Series ID` Month Turnover
<chr> <chr> <chr> <mth> <dbl>
1 Australian Capit… Cafes, restaurants and … A3349849A 1982 Apr 4.4
2 Australian Capit… Cafes, restaurants and … A3349849A 1982 May 3.4
3 Australian Capit… Cafes, restaurants and … A3349849A 1982 Jun 3.6
4 Australian Capit… Cafes, restaurants and … A3349849A 1982 Jul 4
5 Australian Capit… Cafes, restaurants and … A3349849A 1982 Aug 3.6
6 Australian Capit… Cafes, restaurants and … A3349849A 1982 Sep 4.2
7 Australian Capit… Cafes, restaurants and … A3349849A 1982 Oct 4.8
8 Australian Capit… Cafes, restaurants and … A3349849A 1982 Nov 5.4
9 Australian Capit… Cafes, restaurants and … A3349849A 1982 Dec 6.9
10 Australian Capit… Cafes, restaurants and … A3349849A 1983 Jan 3.8
# … with 64,522 more rows
# A tsibble: 4 x 8 [1D]
# Key: Symbol [4]
# Groups: Symbol [4]
Symbol Date Open High Low Close Adj_Close Volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
Forecast Errors
An error is the difference between and observed value and a forevast:
\[ e_{T = h} = y_{T+h} - \hat{y}_{T+h|T} \]
where the training data is \({t_1,\ldots,y_T}\) and the test data is \({y_{T+1},y_{T+2}, \ldots}\)
- Residuals are calculated on the training set, and forecast errors on the test set.
- Residuals are based on one-step forecasts, while forecast errors can involve multi-step forecasts.
Scale Dependence
Forecast errors are on the same scale as the data, and then cannot be compared between data sets. The two most commonly used scale-dependent measures are:
- Mean Absolute Error - \(MAE = mean(|e_t|)\)
- Root Mean Squard Error - \(RMSE = \sqrt{mean(e^2_t)}\)
A forecast method than minimised the MAE will lead to forecasts of the median, while minimising the RMSE will lead to forecasts of the mean.
Percentage Errors
Percentage errors are given by $p_t = 100e_t / y_t. They’re unit free so can be used to forecast between datasets. The Common used measure is mean absolute percentage error:
\[ MAPRE = mean(|p_t|) \]
The disadvantage is that the percentage error is infinite or undefined if y_t = 0. They have a disadvantage in that they put a heavier penalty on negative numbers. Symmetruc MAPTE was proposed to resolve this:
\[ sMAPE = mean(200|y_t - \hat{y}_t| / (y_t + \hat{y}_t)) \]
Hyndman and Koehler recommend that sMAPE not be used.
Scaled Errors
These were proposed by Hyndman and Koehler as an alternative to using percentage errors. They proposed scaling the errors based on the training MAE.
For non-seasonal time series, scaled errors uses naive forecasts:
\[ q_j = \frac{ e_j }{ \frac{1}{T - 1} \sum_{t=2}^T |y_t - y_{t-1}| } \]
The numerator and denominator both involve values on the scale of the data, \(q_j\) in independent of the original data. A scaled error is less than one if it arises from a better forecast than the average naive forecast.
For seasonal time series, a scaled error can be defined using seasonal naive forecasts:
\[ q_j = \frac{ e_j }{ \frac{1}{T - m} \sum_{t=m+1}^T |y_t - y_{t-m}| } \]
The mean absolute scaled error is simply
\[ MASE = mean(|q_j|) \]
Examples
beer_post_1992 <-
aus_production %>%
filter_index('1992' ~ .)
# Extract training set
beer_train <-
beer_post_1992 %>%
filter_index(. ~ '2007')
# Train the models
beer_fit <-
beer_train %>%
model(
mean = MEAN(Beer),
naive = NAIVE(Beer),
`seasonal naive` = SNAIVE(Beer),
drift = RW(Beer ~ drift())
)
# Forecast based on the models
beer_forecast <-
beer_fit %>%
forecast(h = 13)
# Plot the forecasts against the full data
beer_forecast %>%
autoplot(filter_index(aus_production, '1992' ~ .), level = NULL)
The accuracy of the forecasts can be measured using the accuracy()
function:
## # A tibble: 4 x 9
## .model .type ME RMSE MAE MPE MAPE MASE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift Test -4.67 37.3 31.4 -1.85 7.37 2.17 -0.0646
## 2 mean Test -15.8 40.2 37.4 -4.52 8.98 2.58 -0.0701
## 3 naive Test -6.54 37.5 32.4 -2.30 7.64 2.23 -0.0701
## 4 seasonal naive Test -3.62 10.2 9 -0.840 2.16 0.621 0.352
It’s already obvious from the graph that the seasonal naive is the best model.
Time Series Cross-Validation
In time series cross validation there are a series of test sets with one observation. The training set consists only of observations that ocurred prior to the test observation. Since a large number of observations are needed to train, the early observations are not considered in the test set.
The forecast accuracy is calculated by averaging over all of the test sets.
The cross-validation procedure can be modified to allow multi-step errors to be used.
Example
google_2015_train <-
google_2015 %>%
slice(1 : n() - 1) %>%
stretch_tsibble(.init = 3, .step = 1)
google_2015_fc <-
google_2015_train %>%
model(RW(Close ~ drift())) %>%
forecast(h = 1)
google_2015_fc %>% accuracy(google_2015)
## # A tibble: 1 x 10
## .model Symbol .type ME RMSE MAE MPE MAPE MASE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 RW(Close ~ drift… GOOG Test 0.726 11.3 7.26 0.112 1.19 1.02 0.0985
## # A tibble: 1 x 10
## Symbol .model .type ME RMSE MAE MPE MAPE MASE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GOOG RW(Close ~… Trai… -2.97e-14 11.1 7.16 -0.0267 1.18 1.00 0.0976
Exercises
- Produce forecasts for the following series using whichever of
NAIVE(y)
,SNAIVE(y)
orRW(y ~ drift())
is more appropriate in each case:
- Australian Population (global_economy)
The population moves up without any seasonal factors, so a drift would be the most appropriate.
global_economy %>%
filter(Code == 'AUS') %>%
model(Drift = RW(Population ~ drift())) %>%
forecast(h = 10) %>%
autoplot(global_economy) +
labs(
title = 'Australian Population',
subtitle = 'Drift Forecast'
)
- Bricks (aus_production)
We’ll try a seasonal naive:
aus_production %>%
drop_na(Bricks) %>%
model(snaive = SNAIVE(Bricks)) %>%
forecast(h = 20) %>%
autoplot(aus_production)
## Warning: Removed 20 row(s) containing missing values (geom_path).
- NSW Lambs (aus_livestock)
nsw_lambs <-
aus_livestock %>%
filter(Animal == 'Lambs' & State == 'New South Wales')
nsw_lambs %>%
model(drift = RW(Count ~ drift())) %>%
forecast(h = 30) %>%
autoplot(nsw_lambs)
- Use the Facebook stock price (data set gafa_stock) to do the following:
- Produce a time plot of the series.
- Produce forecasts using the drift method and plot them.
google_forecast <-
google_stock %>%
filter_index('2018' ~ .) %>%
fill_gaps() %>%
model(drift = RW(Close ~ drift())) %>%
forecast(h = 100)
google_forecast %>% autoplot(filter_index(google_stock, '2018' ~ .))
- Show that the forecasts are identical to extending the line drawn between the first and last observations.
google_forecast %>%
autoplot(filter_index(google_stock, '2018' ~ .)) +
geom_segment(
aes(
x = Date[1],
y = Close[1],
xend = Date[length(Date)],
yend = Close[length(Close)]
),
color = 'red',
linetype = 'dashed'
)
Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?
google_stock %>%
filter_index('2016' ~ .) %>%
fill_gaps() %>%
model(
Mean = MEAN(Close),
Naive = NAIVE(Close),
`Seasonal Naive` = SNAIVE(Close),
Drift = RW(Close ~ drift())
) %>%
forecast(h = 400) %>%
autoplot(filter_index(google_stock, '2016' ~ .), level = NULL)
## Warning: Removed 2 row(s) containing missing values (geom_path).
Which one is going to be best? Stock price is such a random process that it would be hard to even guess.
- Produce forecasts for all of the Victorian series in aus_livestock using SNAIVE(). Plot the resulting forecasts including the historical data. Is this a reasonable benchmark for these series?
vic_livestock <-
aus_livestock %>%
filter(Animal == 'Calves' | Animal == 'Lambs') %>%
filter(State == 'Victoria')
vic_livestock %>%
model(`Seasonal Naive` = SNAIVE(Count)) %>%
forecast(h = 50) %>%
autoplot(vic_livestock)
From a purely visual perspective, the seasonal naive does look like a reasonable benchmark for these series.
- Calculate the residuals from a seasonal naïve forecast applied to the quarterly Australian beer production data from 1992.
post_1992_beer <-
aus_production %>%
filter_index('1992' ~ .)
beer_fit <-
post_1992_beer %>%
model(snaive = SNAIVE(Beer))
beer_fit %>% gg_tsresiduals()
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing non-finite values (stat_bin).
We can conclude that the beer production is very seasonal, with a lag of 4 quarters (1 year).
- Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
# Looking and the series
aus_economy <-
global_economy %>%
filter(Code == 'AUS')
aus_economy %>%
autoplot(Exports)
# There doesn't appear to be seasonality, only trend and cycle.
# Naive is more appropritate in this case
aus_export_model <-
aus_economy %>%
model(naive = NAIVE(Exports))
aus_export_model %>% gg_tsresiduals()
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
- Are the following statements true or false? Explain your answer.
- Good forecast methods should have normally distributed residuals.
- A good forecast doesn’t necessarily need normally distributed residuals, however they are needed in order to get good prediction intervals.
- A model with small residuals will give good forecasts.
- A model with small residuals means it has been fit well, however it doesn’t necessarily mean it will give good forecasts.
- The best measure of forecast accuracy is MAPE.
- It recommended (Hyndman and Koehler) that MAPE not be used.
- If your model doesn’t forecast well, you should make it more complicated.
- Making a model more complicated generally means making it more flexible. You may be able to get a better fit to the training data, but these high variance, low bias models will likely perform worse on a test set.
- Always choose the model with the best forecast accuracy as measured on the test set.
- If you want to use the model for forecasting, then this is correct.
- For your retail time series (from Exercise 6 in Section 2.10):
- Create a training dataset consisting of observations before 2011:
aus_retail_hospitality <- aus_retail %>%
filter(`Series ID` == 'A3349640L')
aus_retail_hospitality_train <-
aus_retail_hospitality %>%
filter_index(. ~ '2011')
- Check that your data have been split appropriately by producing the following plot.
aus_retail_hospitality %>%
autoplot(Turnover) +
autolayer(aus_retail_hospitality_train, Turnover, colour = 'blue')
- Calculate seasonal naïve forecasts using SNAIVE() applied to your training data
aus_retail_model <-
aus_retail_hospitality_train %>%
model(`seasonal naive` = SNAIVE(Turnover))
aus_retail_forecast <-
aus_retail_model %>%
forecast()
- Compare the accuracy of your forecasts against the actual values.
## # A tibble: 1 x 11
## State Industry .model .type ME RMSE MAE MPE MAPE MASE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Victo… Cafes, res… seaso… Trai… 13.1 29.8 20.6 7.46 11.5 1 0.820
## # A tibble: 1 x 11
## .model State Industry .type ME RMSE MAE MPE MAPE MASE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 season… Victo… Cafes, re… Test -28.9 37.1 30.3 -7.79 8.16 1.47 0.346
- Check the residuals.
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_bin).
The residuals are correlated, with a 24 month seasonal pattern. The residuals look resonably normal, with the exception that there seems to be a bias towards positive residuals.
- How sensitive are the accuracy measures to the amount of training data used?
snaive_accuracy <- function(var, train, test) {
train %>%
model(SNAIVE({{ var }})) %>%
forecast() %>%
accuracy(test) %>%
pull(RMSE)
}
years <- c(1987:2011)
years %>%
as.character() %>%
map(~filter_index(aus_retail_hospitality, . ~ .x)) %>%
map_dbl(~snaive_accuracy(Turnover, .x, aus_retail_hospitality)) %>% {
tibble(
rmse = .,
year_range = years
)
} %>%
ggplot(aes(years, rmse)) +
geom_line() +
geom_point() +
labs(
x = 'Year Range: 1982 - x',
y = 'RMSE of SNAIVE Model',
title = 'RMSE of Test Error'
)
We can see that the RMSE is stable until post-1996. When including data from 1982 until here, the RMSE starts to increase.
- Consider the number of pigs slaughtered in New South Wales (data set aus_livestock).
- Produce some plots of the data in order to become familiar with it.
au_pigs <-
aus_livestock %>%
filter(Animal == 'Pigs' & State == 'New South Wales')
au_pigs %>% gg_tsdisplay(Count)
- Create a training set of 486 observations, witholding a test set of 72 observations (6 years).
au_pigs_train <- au_pigs %>%
mutate(id = 1:n()) %>%
filter(id <= 486) %>%
select(-id)
au_pigs_test <- au_pigs %>%
mutate(id = 1:n()) %>%
filter(id > 486) %>%
select(-id)
- Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?
au_pigs_models <-
au_pigs_train %>%
model(
mean = MEAN(Count),
naive = NAIVE(Count),
snaive = SNAIVE(Count),
drift = RW(Count ~ drift())
)
au_pigs_models %>%
forecast(au_pigs_test) %>%
accuracy(au_pigs_test)
## # A tibble: 4 x 11
## .model Animal State .type ME RMSE MAE MPE MAPE MASE
## <chr> <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift Pigs New … Test -4685. 8091. 6967. -7.36 10.1 NaN
## 2 mean Pigs New … Test -39360. 39894. 39360. -55.9 55.9 NaN
## 3 naive Pigs New … Test -6138. 8941. 7840. -9.39 11.4 NaN
## 4 snaive Pigs New … Test -5838. 10111. 8174. -8.81 11.9 NaN
## # … with 1 more variable: ACF1 <dbl>
The seasonal naive method appears to have the best result on the test data.
- Check the residuals of your preferred method. Do they resemble white noise?
We recall that the residuals are the terms from the training set, not the test set. Those would be the error terms.
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing non-finite values (stat_bin).
The residuals do not appear like white noise, there is some correlation within the first 6 months.
- Create a training set for household wealth (
hh_budget
) by witholding the last four years as a test set.
library(tsibbledata)
hh_budget_train <-
hh_budget %>%
filter_index(. ~ '2012')
hh_budget_test <-
hh_budget %>%
filter_index('2013' ~ .)
- Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set.
hh_budget_models <-
hh_budget_train %>%
model(
mean = MEAN(Wealth),
naive = NAIVE(Wealth),
snaive = SNAIVE(Wealth),
drift = RW(Wealth ~ drift())
)
## Warning: 4 errors (1 unique) encountered for snaive
## [4] Non-seasonal model specification provided, use RW() or provide a different lag specification.
- Compute the accuracy of your forecasts. Which method does best?
hh_budget_forecast %>% accuracy(hh_budget_test) %>%
group_by(Country) %>%
arrange(RMSE) %>%
slice(1)
## # A tibble: 4 x 10
## # Groups: Country [4]
## .model Country .type ME RMSE MAE MPE MAPE MASE ACF1
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift Australia Test 29.1 35.5 29.1 7.23 7.23 NaN 0.210
## 2 drift Canada Test 33.3 37.2 33.3 6.09 6.09 NaN -0.229
## 3 drift Japan Test 14.7 17.9 14.7 2.44 2.44 NaN -0.229
## 4 drift USA Test 75.9 76.2 75.9 12.7 12.7 NaN -0.561
The drift model had the best result (measured using RMSE) against 3 out of the 4 countries in the series. The exception was the USA series where the naive method returned the best RMSE.
- Do the residuals from the best method resemble white noise?
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
The residuals do appear as white noise.
- Create a training set for Australian takeaway food turnover (aus_retail) by witholding the last four years as a test set.
aus_takeaway_turnover <-
aus_retail %>%
filter(Industry == 'Takeaway food services')
aus_taway_train <-
aus_takeaway_turnover %>%
filter_index(. ~ '1994')
aus_taway_test <-
aus_takeaway_turnover %>%
filter_index('1995' ~ .)
- Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set.
aus_taway_models <-
aus_taway_train %>%
model(
mean = MEAN(Turnover),
naive = NAIVE(Turnover),
snaive = SNAIVE(Turnover),
drift = RW(Turnover ~ drift())
)
aus_taway_forecast <-
aus_taway_models %>%
forecast(aus_taway_test)
- Compute the accuracy of your forecasts. Which method does best?
## # A tibble: 8 x 11
## # Groups: State [8]
## .model State Industry .type ME RMSE MAE MPE MAPE MASE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift Austr… Takeawa… Test 0.496 2.69 2.00 -1.05 13.0 NaN 0.901
## 2 drift New S… Takeawa… Test 42.8 87.7 60.8 6.68 16.9 NaN 0.934
## 3 snaive North… Takeawa… Test 7.89 9.87 7.89 54.3 54.3 NaN 0.966
## 4 drift Queen… Takeawa… Test 24.2 35.0 26.5 9.48 11.0 NaN 0.804
## 5 drift South… Takeawa… Test 2.81 8.29 5.83 1.84 9.14 NaN 0.807
## 6 drift Tasma… Takeawa… Test -2.48 3.54 2.94 -17.7 19.6 NaN 0.818
## 7 drift Victo… Takeawa… Test 39.2 63.1 47.0 11.9 18.4 NaN 0.938
## 8 drift Weste… Takeawa… Test 6.22 19.0 15.6 0.159 15.1 NaN 0.906
The drift appears to have performed the best on most all states except for the Northern territory.
- Do the residuals from the best method resemble white noise?
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
The residuals don’t appear as white noise, there is still some 6 month seasonality that hasn’t been captured by the model.