library(tidyverse)
library(forecast)
library(fma)
library(magrittr)
library(fpp3)
library(tsibble)
library(lubridate)
library(feasts)

Simple Forecasting Methods

Some methods are extremely simple yet surprisingly effective. The following methods can be considered the benchmark against which other models will be tested.

Average Method

Here, the forecasts of all future values are equal to the average (mean) of the historical data.

\[ \hat{y}_{T + h|T} = \bar{y} = (y_1 + \ldots + y_T)/T \]

bricks <-
  aus_production %>% 
  filter(between(year(Quarter), 1970, 2004))

bricks %>% 
  model(MEAN(Bricks)) %>% 
  forecast() %>% 
  autoplot(bricks, PI = FALSE)

## Warning: Ignoring unknown parameters: PI

## Warning: Ignoring unknown parameters: PI

## Warning: Ignoring unknown parameters: PI

## Warning: Ignoring unknown parameters: PI

Naive Method

For naive forecasts, we simply set all forecasts to be the value of the last observation:

\[ \bar{y}_{T + h|T} = y_T \]

bricks %>% 
  model(naive = NAIVE(Bricks)) %>% 
  forecast() %>% 
  autoplot(bricks)

Because a naive forecast is optimal when data follow a random walk, these are also called random walk forecasts.

Seasonal Naive Method

A similar method is useful for highly seasonal data. Each forecast is set to bethe last observed value from the same season of the year.

\[ \bar{y}_{T + h|T} = y_{T + h - m(k+1)} \]

Where $m$ is the seasonal period, and $k$ is the integer part of $(h - 1)/m$, i.e. the number of complete years in the forecast period prior to time $T + h$.

e.g. for monthly data, the forecast for all future February values is equal to the last observed February value.

bricks %>% 
  model(snaive = SNAIVE(Bricks ~ lag('year'))) %>% 
  forecast(h = '5 years') %>% 
  autoplot(bricks)

Drift Method

A variation on the naive method is to allow the forecasts to increase or decrease over time, where the amount of change over time is called drift. This is set to the average change in the historical data.

\[ \bar{y}_{T + h|T} = y_T + \frac{h}{T-1} \sum_{t=2}^T (y_y - y_{t-1} ) \\ = y_T + h \bigg( \frac{ y_T - y_1 }{ T - 1 } \bigg) \]

This is equivalent between drawing a line between the first and last observations and extrapolating this into the future.

bricks %>% 
  model(rw = RW(Bricks ~ drift())) %>% 
  forecast(h = '5 years') %>% 
  autoplot(bricks)

Examples

Applying the first three methods to the quarterly beer production

ausbeer_training <- 
  aus_production %>%
  filter_index("1992 Q1" ~ "2006 Q4")

beer_fc <-
  ausbeer_training %>% 
  model(
    mean = MEAN(Beer),
    naive = NAIVE(Beer),
    seasonal_naive = SNAIVE(Beer)
  ) %>% 
  forecast(h = 14)

beer_fc %>% 
    autoplot(ausbeer_training, level = NULL) +
    autolayer(
      filter_index(aus_production, "2007 Q1" ~ .), 
      Beer,
      colour = 'black'
    ) + 
    labs(
        x = 'Year',
        y = 'Megalitres',
        title = 'Forecasts for Quarterly Beer Production',
        colour = 'Forecast Method'
    )

Non-seasonal methods applied to 200 days of Google stock prices:

google_stock <-
  gafa_stock %>% 
  filter(Symbol == 'GOOG') %>% 
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE)

google_2015 <-
  google_stock %>% 
  filter(year(Date) == 2015)

google_fit <-
  google_2015 %>% 
  model(
    mean = MEAN(Close),
    naive = NAIVE(Close),
    drift = NAIVE(Close ~ drift())
  )

google_jan_2016 <-
  google_stock %>% 
  filter(yearmonth(Date) == yearmonth('2016 Jan'))

# Use the filtered data as the forecast horizon, rather than
# using h = <forecast>

google_fit %>% 
  forecast(google_jan_2016) %>% 
  autoplot(google_2015, level = NULL) +
  autolayer(google_jan_2016, Close, colour = 'black') +
  labs(
    x = 'Day',
    y = 'Closing Price (USD)',
    title = 'Google Stock',
    subtitle = 'Closing Price Forecasts'
  ) +
  guides(
    colour = guide_legend(title = 'Forecast')
  )

Sometimes these methods are the best forecasting method available, but mainly they are used as a benchmark - i.e. if a more complex forecasting method can’t perform better than these, it shouldn’t be considered.

Fitted Values and Residuals

Each observation can be forecast using all previous observations. These are called fitted values and are denoted by $\hat{y}_{t|t-1}$, or simply $\hat{y}_t$. They always involve one-step forecasts.

They are often not true forecasts as they are estimated using all available observations, including those in the future. For example in the average method, $y_t = \hat{c}$ where $\hat{c}$ is the average of all available observations, including those after $t$.

Residuals

Residuals in a time series model are what is ‘left over’ after fitting a model. For many (but not all) models, the residuals are $e_t = y_t - \hat{y}_t$.

The augment() function can be used to calculate the residuals:

google_fit %>% 
  augment()

## # A tsibble: 756 x 6 [1]
## # Key:       Symbol, .model [3]
##    Symbol .model   day Close .fitted .resid
##    <chr>  <chr>  <int> <dbl>   <dbl>  <dbl>
##  1 GOOG   mean     253  522.    602.  -79.6
##  2 GOOG   mean     254  511.    602.  -90.5
##  3 GOOG   mean     255  499.    602. -102. 
##  4 GOOG   mean     256  498.    602. -103. 
##  5 GOOG   mean     257  500.    602. -102. 
##  6 GOOG   mean     258  493.    602. -108. 
##  7 GOOG   mean     259  490.    602. -112. 
##  8 GOOG   mean     260  493.    602. -108. 
##  9 GOOG   mean     261  498.    602. -103. 
## 10 GOOG   mean     262  499.    602. -103. 
## # … with 746 more rows

Residual Diagnostics

A good forecasting method with yielf residuals with the following properties:

The residuals are uncorrelated.
The residuals have a zero mean.

Adjusting the bias is easy - if the residuals have mean $m$, add $m$ to all of the forecasts. Fixing correlation is harder.

It is useful (but not necessary) that the residuals have:

Constant variance
Normally distributed

These two make the calculation of prediction intervals easier.

Example

For stock market prices and indexes, the naive methods are often the best. Each forecast is equal to the last observed value, so then the residuals are equal to the difference between consecutive observatiosn:

\[ e_t = y_t - \hat{y_t} = y_t - y_{t-1} \]

Let’s take a look at the Google daily closing stock price:

google_2015 %>% 
  autoplot(Close) +
  labs(
      x = 'Day',
      y = 'Closing Price (USD)',
      title = 'Google Closing Stock Price (2015)'
  )

Let’s take a look at the residuals from forecasting using the naive method:

google_2015 %>% 
  model(naive = NAIVE(Close)) %>% 
  augment() %>% 
  autoplot(.resid) +
  labs(
    x = 'Day',
    y = 'Naive Residual (USD)',
    title = 'Google Stock Price 2015',
    subtitle = 'Naive Model Residuals'
  )

## Warning: Removed 1 row(s) containing missing values (geom_path).

google_2015 %>% 
  model(naive = NAIVE(Close)) %>% 
  augment() %>% 
  ggplot() +
  geom_histogram(aes(.resid), binwidth = 2) +
  labs(
    x = 'Residual',
    y = 'Count',
    title = 'Google Stock Price (2015)',
    subtitle = 'Naive Model - Residual Histogram'
  )

## Warning: Removed 1 rows containing non-finite values (stat_bin).

We see the right tail is a little too long for a normal distribution.

google_2015 %>% 
  model(naive = NAIVE(Close)) %>% 
  augment() %>% 
  ACF(.resid) %>% 
  autoplot() +
  labs(
    x = 'Lag',
    y = 'Autocorrelation Coefficient',
    title = 'Google Naive Model - Residual ACF'
  )

The graphs show that the naive method appears to account for all information. The mean of the residuals is close to zero and there is no significant correlation between residuals as shown in the ACF plot. The time plot shows the residual variation is resonably constant, with the exception of one outlier.

The gg_tresiduals() is a convenient shortcut:

google_2015 %>% 
  model(naive = NAIVE(Close)) %>% 
  gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

Portmanteau Test for Autocorrelation

In addition to the ACF plot, there is a more formal test for autocorrelation. The whole set of $r_k$ values is treated as a group, rather than individually. Remembering that $r_k$ is the autocorrelation for lag $k$.

The ACF plot is essentially a multiple hypothesis test, and there is a probability of a false positive. With enough tests it is probable we will get one false positive, concluding there is some autocorrelation.

To overcome this, the first $\ell$ autocorrelations are tested to see whether they are significantly different from what would be expected from a white noise process.

One such test is the Box-Pierce test:

\[ Q = T \sum_{k=1}^\ell r^2_k \]

where $\el$ is the maximum lag being considered and $T$ is the number of observations. If each $r_k$ is close to zero, $Q$ will be small. If some $r_k$ values are large, then $Q$ will be large. A suggestion is to use $h = 10$ for non-seasonal data and $h = 2m$ for seasonal data, where $m$ is the period. However the test is not good when $h$ is large, so if values are larger than $T/5$, use $h = T/5$.

A related a more accurate test is the Ljung-Box test:

\[ Q^* - T(T + 2) \sum_{k=1}^\ell (T - k)^{-1} r^2_k \]

Again, large values of $Q^*$ suggest the autocorrelations do not come from a white noise series.

How large is too large? If the autocorrelations did come from a white noise series, then both $Q$ and $Q^*$ would have a $\chi^2$ distribution with $(h - K)$ degrees of freedom, where $K$ is the number of parameters in the model. If calculated fom raw data (rather than residuals), then $K = 0$.

For the Google stocket price, the naive model has no parameters, so $K = 0$ also.

google_2015 %>% 
  model(naive = NAIVE(Close)) %>% 
  augment() %>%
  features(.resid, ljung_box, lag = 10, dof = 0)

# A tibble: 1 x 4
  Symbol .model lb_stat lb_pvalue
  <chr>  <chr>    <dbl>     <dbl>
1 GOOG   naive     7.91     0.637

For this test, the results are not significant, so we would say that the series is not distinguishable from white noise.

An alternative approach could be the drift method:

google_2015 %>% 
  model(naive = RW(Close ~ drift())) %>% 
  tidy()

## # A tibble: 1 x 7
##   Symbol .model term  estimate std.error statistic p.value
##   <chr>  <chr>  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
## 1 GOOG   naive  b        0.944     0.705      1.34   0.182

The estimated parameter is the drift coefficient which measures the average daily change observed in the historical data.

Applying the Ljung-Box test:

google_2015 %>% 
  model(naive = RW(Close ~ drift())) %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 10, dof = 1)

## # A tibble: 1 x 4
##   Symbol .model lb_stat lb_pvalue
##   <chr>  <chr>    <dbl>     <dbl>
## 1 GOOG   naive     7.91     0.543

Again,the p-value is high, so this is indistinguishable from white noise.

Forecast Distirbutions

Uncertainty in the forecasts is expressed using a probablity distribution. The point forecast is the mean of this distribution. Most time-series forecasts produce normally-distributed forecasts.

Prediction Intervals

A prediction interval is the interval within which we expect $y_t$ to lie with a specified probability.

Assuming a normal, the prediction 95% prediction interval for the $h$-step forecast is:

\[ y_{t+h|T} \pm 1.96\hat{\sigma}_h \] Where $\hat{\sigma}_h$ is the standard deviation of the $h$ step forecast.

It can also be expressed as $c\hat{\sigma}_h$, where $c$ is the percentage.

# 50%, 80%, 95%
qnorm(.75)

[1] 0.6744898

qnorm(.9)

[1] 1.281552

qnorm(.975)

[1] 1.959964

One-Step Prediction Intervals

When forecasting one-step ahead, the standard deviation of the forecast distribution is almost the same as the standard deviation of the residuals.

# Last observed in the series
google_2015 %>% 
  slice(n()) %>% 
  pull(Close)

[1] 758.88

# Standard deviation of the residuals
google_2015 %>% 
  model(naive = NAIVE(Close)) %>% 
  augment() %>% 
  pull(.resid) %>% 
  sd(na.rm = TRUE)

[1] 11.17197

# 80% prediction interval:
758.88 + (c(-1,1) * qnorm(.9) * 11.17)

[1] 744.5651 773.1949

Multi-Step Prediction Intervals

Prediction intervals generally increase as the forecast horizon increases, so $\sigma_h$ increases with $h$.

Benchmark Methods

Here are the formulas for the benchmark methods, where $\hat{\sigma}_h$ is the residual standard deviation.

Mean: $\hat{\sigma}_h = \hat{\sigma} \sqrt{1 + 1/T}$
Naive: $\hat{\sigma}_h = \hat{\sigma} \sqrt{h}$
Seasonal Naive: $\hat{\sigma}_h = \hat{\sigma} \sqrt{k + 1}$
- $k$ is the integer part of $(h - 1)/m$
- $m$ is the seasonal period
Drift: $\hat{\sigma}_h = \hat{\sigma} \sqrt{h(1 + h/t)}$

When $h = 1$ and $T$ is large, these all give the same approximate value for $\hat{\sigma}$.

Prediction intervals can be computed using the fable package.

aus_production %>%
  model(naive = NAIVE(Beer)) %>% 
  forecast(h = "3 years") %>% 
  hilo()

## # A tsibble: 12 x 6 [1Q]
## # Key:       .model [1]
##    .model Quarter          Beer .mean                   `80%`
##    <chr>    <qtr>        <dist> <dbl>                  <hilo>
##  1 naive  2010 Q3  N(374, 4580)   374 [287.27351, 460.7265]80
##  2 naive  2010 Q4  N(374, 9159)   374 [251.35022, 496.6498]80
##  3 naive  2011 Q1 N(374, 13739)   374 [223.78531, 524.2147]80
##  4 naive  2011 Q2 N(374, 18319)   374 [200.54702, 547.4530]80
##  5 naive  2011 Q3 N(374, 22898)   374 [180.07367, 567.9263]80
##  6 naive  2011 Q4 N(374, 27478)   374 [161.56435, 586.4357]80
##  7 naive  2012 Q1 N(374, 32057)   374 [144.54327, 603.4567]80
##  8 naive  2012 Q2 N(374, 36637)   374 [128.70044, 619.2996]80
##  9 naive  2012 Q3 N(374, 41217)   374 [113.82052, 634.1795]80
## 10 naive  2012 Q4 N(374, 45796)   374 [ 99.74675, 648.2532]80
## 11 naive  2013 Q1 N(374, 50376)   374 [ 86.36077, 661.6392]80
## 12 naive  2013 Q2 N(374, 54956)   374 [ 73.57062, 674.4294]80
## # … with 1 more variable: `95%` <hilo>

aus_production %>%
  model(naive = NAIVE(Beer)) %>% 
  forecast(h = "3 years") %>% 
  autoplot(filter_index(aus_production, '2000' ~ .))

Prediction Intervals from Bootstrapped Residuals

When a normal distribution is an unreasonable assumption, an alternative is to bootstrap the residuals. The only assumption then is that the residuals are uncorrelated.

A one step forecast is defined as $e_t = y_t - \hat{y}_{t|t-1}$, which can be rewritten as $y_t = \hat{y}_{t|t-1} + e_t$.

The next observation can be written as $y_{T+1} = \hat{y}_{T+1|T} + e_{T+1}$, which is a one-step forecast plus a future error. If we assume that future errors will be similar to past errors, the $e_{T+1}$ can be replaced by sampling from errors (residuals) we’ve seen in the past. This can then be done in the same way for $T+2, \ldots$.

The generate() function can be used for this:

google_2015 %>% {
  model(., naive = NAIVE(Close)) %>% 
  generate(h = 30, times = 5, bootstrap = TRUE) %>% 
  ggplot() +
  geom_line(aes(day, .sim, colour = .rep)) +
  geom_line(data = ., aes(day, Close)) +
  labs(
    x = 'Day',
    y = 'Closing Price (USD)',
    title = 'Google Stock Price',
    subtitle = 'Naive Forecast - Bootstrapped'
  )
}

Prediction intervals can be computed by calculating percentiles of the future sample paths for each forecast horizon. The result is a bootstrapped prediction interval. This is done within the forecast() function.

google_2015 %>%
  model(NAIVE(Close)) %>% 
  forecast(h = 30, bootstrap = TRUE) %>% 
  autoplot(google_2015) +
  labs(
    title = 'Google Stock Closing Price',
    subtitle = 'Naive Forecast with Bootstrapping'
  )

THe number of samples can be controlled using the times argument for forecast().

Forecasting Using Transformations

When forecasting a model with a transformation, the tranformation needs to be reversed (back-transformed) to obtain forecasts on the original scale.

The fable package will automatically back-transform the forecasts whenever a transformation has been used in the model definition.

Prediction Intervals with Transformations

If transformation have been used, then the prediction interval is computed on the transformed scale and back-transformed. This preserves the probabilitiy coverage, however it won’t be symmetric around the point forecast.

Forecasting with Constraints.

Transformations can be used to ensure forecasts remain on the appropriate scale; for example, log transformations constrain the forecasts to stay positive, or the logit (logistic) which constrains the forecasts between a specific interval.

\[ f(x) = log \bigg( \frac{ x - a }{ b - x } \bigg) \] Inverting this transformation:

\[ f^{-1}(x) = \frac{ a | be^x }{ 1 + e^x } = \frac{ (b - a)e^x }{ 1 + e^x } + a \]

The new_transformation() function can be used to use a transformation iwth a model. It’s of the form:

logistic <- new_transformation(
  transformation = function(x, lower = 0, upper = 1) {
    log((x - lower)/(upper - x))
  },
  inverse = function(x, lower = 0, upper = 1) {
    (upper - lower)*exp(x) / (1 + exp(x)) +lower
  }
)

You could then use logistic(y, 0, 100) on the left hand side of the formula.

Bias Adjustments

An issue with transformation, such as the Box-Cox, is that the back-transformed point forecast will not be the mean of the forecast distribution, it will usually be the median.

This can be acceptable, but you may want the mean. For example, adding up sales forecasts from various regions to form a forecast for the entire country. Means add up, but medians don’t.

The reverse Box-Cox:

$$ y_t = \begin{cases} exp(w_t) & = 0; \ (w_t + 1)^{1/} &

\end{cases} $$

The difference between the simple back transformed forecast and the above is called the bias. When using it within a model, we say the point forecasts have been bias-adjusted.

We can see how much difference the bias-adustment makes:

eggs %>% 
  as_tsibble() %>% 
  model(rw = RW(log(value) ~ drift())) %>% 
  forecast(h = 50) %>% 
  autoplot(eggs, point_forecast = lst(mean, median)) +
  labs(
    x = 'Year',
    y = 'Price (constant dollars)',
    title = 'Cost of Eggs in the US'
  )

## Warning: Ignoring unknown aesthetics: linetype

Bias adjusted forecast means are automatically computed in the fable package when using mean() on a distribution. The median (point forecast prior to bias adjustment) can be obtained using the median() function.

Forecasting with Decomposition

Assuming an additive decomposition, the decomposed time series can be written as:

\[ \hat{y}_t = \hat{S}_t + \hat{A}_t \] where $\hat{A}_t = \hat{T}_t + \hat{R}_t$ is the seasonally adjusted component.

To forecast a decomposed series, we forecast the seasonal component _t, and the seasonally adjusted component _t separately. The assumption is that the seasonal component is unchanging (or changing very slowly). Thus it is forecasted by taking the last year of the estimated component - the seasonal-naive method.

For the seasonally adjusted component, any non-seasonal forecasting method can be used:

Random walk with drift
Holt’s method
Non-season ARIMA

Example

In this example we forecast employment in the US retail sector:

us_employ_comp <-
  us_employment %>% 
  filter_index('1990' ~ .) %>% 
  filter(Title == 'Retail Trade') %>% 
  model(STL = STL(Employed ~ trend(window = 7), robust = TRUE)) %>%
  components() %>% 
  select(-.model)


us_employ_comp %>% 
  model(NAIVE(season_adjust)) %>% 
  forecast() %>% 
  autoplot(us_employ_comp) + 
  labs(
    x = 'Month',
    y = 'New Orders Index',
    title = 'US Retail Trade',
    subtitle = 'Naive Forecast of Seasonally Adjusted Data'
  )

This shows naive forecasts of the seasonally adjusted data.

These are then ‘reseasonalisd’ by adding in the seasonal naive forecasts of the seasonal component. The decomposition_model() function makes this easier. It allows for the computing of forecasts via any additive decomposition. Seasonal components will be forecasted automatically using SNAIVE() if no model is specified.

us_retail <-
  us_employment %>%
  filter_index('1990' ~ .) %>% 
  filter(Title == 'Retail Trade')

us_decomp_fit <-
  us_retail %>%
  model(stlf = decomposition_model(
    STL(Employed ~ trend(window = 7), robust = TRUE),
    NAIVE(season_adjust)
  )) 

us_decomp_fit %>% 
    forecast() %>% 
    autoplot(us_retail)

The upper and lower bounds of the prediction intervals are ‘reseasonalised’ by adding in the forecasts of the seasonal component.

The ACF shows significant autocorrelations due to the naive method not capturing changing trend in the seasonally adjusted series:

us_decomp_fit %>% gg_tsresiduals()

## Warning: Residuals of type `innovation` are not supported for STL decomposition model models.
## Defaulting to `type="response"`

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing non-finite values (stat_bin).

As we move along in this study, more suitable methods for forecasting the seasonally adjusted component are explained rather than the naive method.

Evaluating Forecast Accuracy

Training and Test Sets

It’s common to split data into training and test sets, with the test set being around 20% of the data.

Some notes:

A model which fits the training data well will not necessarily forecast well.
A perfect fit can always be obtained by using a model with enough parameters.
Over fitting a model to data is just as bad as failing to identify a systematic pattern in the data.

Function to Subset Time Series

The filter() function can be used:

# Filter based on year
aus_production %>% filter(year(Quarter) >= 1995) %>% slice(1:3)

# A tsibble: 3 x 7 [1Q]
  Quarter  Beer Tobacco Bricks Cement Electricity   Gas
    <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
1 1995 Q1   426    4714    430   1626       41768   131
2 1995 Q2   408    3939    457   1703       43686   167
3 1995 Q3   416    6137    417   1733       46022   181

# Filter based on a specific season
aus_production %>% filter(quarter(Quarter) == 1) %>% slice(1:3)

# A tsibble: 3 x 7 [1Q]
  Quarter  Beer Tobacco Bricks Cement Electricity   Gas
    <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
1 1956 Q1   284    5225    189    465        3923     5
2 1957 Q1   262    5577    187    529        4339     5
3 1958 Q1   272    5758    199    554        4608     5

# Slice the last 4 quarters
aus_production %>% slice(n() - 3:0)

# A tsibble: 4 x 7 [1Q]
  Quarter  Beer Tobacco Bricks Cement Electricity   Gas
    <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
1 2009 Q3   419      NA     NA   2325       58394   252
2 2009 Q4   488      NA     NA   2273       57336   210
3 2010 Q1   414      NA     NA   1904       58309   205
4 2010 Q2   374      NA     NA   2401       58041   236

# Slice with groupings
aus_retail %>%
  group_by(State, Industry)

# A tsibble: 64,532 x 5 [1M]
# Key:       State, Industry [152]
# Groups:    State, Industry [152]
   State             Industry                 `Series ID`    Month Turnover
   <chr>             <chr>                    <chr>          <mth>    <dbl>
 1 Australian Capit… Cafes, restaurants and … A3349849A   1982 Apr      4.4
 2 Australian Capit… Cafes, restaurants and … A3349849A   1982 May      3.4
 3 Australian Capit… Cafes, restaurants and … A3349849A   1982 Jun      3.6
 4 Australian Capit… Cafes, restaurants and … A3349849A   1982 Jul      4  
 5 Australian Capit… Cafes, restaurants and … A3349849A   1982 Aug      3.6
 6 Australian Capit… Cafes, restaurants and … A3349849A   1982 Sep      4.2
 7 Australian Capit… Cafes, restaurants and … A3349849A   1982 Oct      4.8
 8 Australian Capit… Cafes, restaurants and … A3349849A   1982 Nov      5.4
 9 Australian Capit… Cafes, restaurants and … A3349849A   1982 Dec      6.9
10 Australian Capit… Cafes, restaurants and … A3349849A   1983 Jan      3.8
# … with 64,522 more rows

# Find extreme observations
gafa_stock %>% 
  group_by(Symbol) %>% 
  top_n(1, Close)

# A tsibble: 4 x 8 [1D]
# Key:       Symbol [4]
# Groups:    Symbol [4]
  Symbol Date        Open  High   Low Close Adj_Close   Volume
  <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
2 AMZN   2018-09-04 2026. 2050. 2013  2040.     2040.  5721100
3 FB     2018-07-25  216.  219.  214.  218.      218. 58954200
4 GOOG   2018-07-26 1251  1270. 1249. 1268.     1268.  2405600

Forecast Errors

An error is the difference between and observed value and a forevast:

\[ e_{T = h} = y_{T+h} - \hat{y}_{T+h|T} \]

where the training data is ${t_1,\ldots,y_T}$ and the test data is ${y_{T+1},y_{T+2}, \ldots}$

Residuals are calculated on the training set, and forecast errors on the test set.
Residuals are based on one-step forecasts, while forecast errors can involve multi-step forecasts.

Scale Dependence

Forecast errors are on the same scale as the data, and then cannot be compared between data sets. The two most commonly used scale-dependent measures are:

Mean Absolute Error - $MAE = mean(|e_t|)$
Root Mean Squard Error - $RMSE = \sqrt{mean(e^2_t)}$

A forecast method than minimised the MAE will lead to forecasts of the median, while minimising the RMSE will lead to forecasts of the mean.

Percentage Errors

Percentage errors are given by $p_t = 100e_t / y_t. They’re unit free so can be used to forecast between datasets. The Common used measure is mean absolute percentage error:

\[ MAPRE = mean(|p_t|) \]

The disadvantage is that the percentage error is infinite or undefined if y_t = 0. They have a disadvantage in that they put a heavier penalty on negative numbers. Symmetruc MAPTE was proposed to resolve this:

\[ sMAPE = mean(200|y_t - \hat{y}_t| / (y_t + \hat{y}_t)) \]

Hyndman and Koehler recommend that sMAPE not be used.

Scaled Errors

These were proposed by Hyndman and Koehler as an alternative to using percentage errors. They proposed scaling the errors based on the training MAE.

For non-seasonal time series, scaled errors uses naive forecasts:

\[ q_j = \frac{ e_j }{ \frac{1}{T - 1} \sum_{t=2}^T |y_t - y_{t-1}| } \]

The numerator and denominator both involve values on the scale of the data, $q_j$ in independent of the original data. A scaled error is less than one if it arises from a better forecast than the average naive forecast.

For seasonal time series, a scaled error can be defined using seasonal naive forecasts:

\[ q_j = \frac{ e_j }{ \frac{1}{T - m} \sum_{t=m+1}^T |y_t - y_{t-m}| } \]

The mean absolute scaled error is simply

\[ MASE = mean(|q_j|) \]

Examples

beer_post_1992 <-
  aus_production %>% 
  filter_index('1992' ~ .)

# Extract training set
beer_train <-
  beer_post_1992 %>% 
  filter_index(. ~ '2007')

# Train the models
beer_fit <-
  beer_train %>% 
  model(
    mean = MEAN(Beer),
    naive = NAIVE(Beer),
    `seasonal naive` = SNAIVE(Beer),
    drift = RW(Beer ~ drift())
  )

# Forecast based on the models
beer_forecast <-
  beer_fit %>% 
  forecast(h = 13)

# Plot the forecasts against the full data
beer_forecast %>% 
  autoplot(filter_index(aus_production, '1992' ~ .), level = NULL)

The accuracy of the forecasts can be measured using the accuracy() function:

beer_forecast %>% 
  accuracy(beer_post_1992)

## # A tibble: 4 x 9
##   .model         .type     ME  RMSE   MAE    MPE  MAPE  MASE    ACF1
##   <chr>          <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
## 1 drift          Test   -4.67  37.3  31.4 -1.85   7.37 2.17  -0.0646
## 2 mean           Test  -15.8   40.2  37.4 -4.52   8.98 2.58  -0.0701
## 3 naive          Test   -6.54  37.5  32.4 -2.30   7.64 2.23  -0.0701
## 4 seasonal naive Test   -3.62  10.2   9   -0.840  2.16 0.621  0.352

It’s already obvious from the graph that the seasonal naive is the best model.

Time Series Cross-Validation

In time series cross validation there are a series of test sets with one observation. The training set consists only of observations that ocurred prior to the test observation. Since a large number of observations are needed to train, the early observations are not considered in the test set.

The forecast accuracy is calculated by averaging over all of the test sets.

The cross-validation procedure can be modified to allow multi-step errors to be used.

Example

google_2015_train <-
  google_2015 %>% 
  slice(1 : n() - 1) %>%
  stretch_tsibble(.init = 3, .step = 1)

google_2015_fc <-
  google_2015_train %>% 
  model(RW(Close ~ drift())) %>% 
  forecast(h = 1)

google_2015_fc %>% accuracy(google_2015)

## # A tibble: 1 x 10
##   .model            Symbol .type    ME  RMSE   MAE   MPE  MAPE  MASE   ACF1
##   <chr>             <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 RW(Close ~ drift… GOOG   Test  0.726  11.3  7.26 0.112  1.19  1.02 0.0985

# Residual accuracy
google_2015 %>% 
  model(RW(Close ~ drift())) %>% accuracy()

## # A tibble: 1 x 10
##   Symbol .model      .type        ME  RMSE   MAE     MPE  MAPE  MASE   ACF1
##   <chr>  <chr>       <chr>     <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>  <dbl>
## 1 GOOG   RW(Close ~… Trai… -2.97e-14  11.1  7.16 -0.0267  1.18  1.00 0.0976

Exercises

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

The population moves up without any seasonal factors, so a drift would be the most appropriate.

global_economy %>%
  filter(Code == 'AUS') %>% 
  model(Drift = RW(Population ~ drift())) %>% 
  forecast(h = 10) %>% 
  autoplot(global_economy) +
  labs(
    title = 'Australian Population',
    subtitle = 'Drift Forecast'
  )

Bricks (aus_production)

We’ll try a seasonal naive:

aus_production %>% 
  drop_na(Bricks) %>% 
  model(snaive = SNAIVE(Bricks)) %>% 
  forecast(h = 20) %>% 
  autoplot(aus_production)

## Warning: Removed 20 row(s) containing missing values (geom_path).

NSW Lambs (aus_livestock)

nsw_lambs <- 
  aus_livestock %>% 
  filter(Animal == 'Lambs' & State == 'New South Wales')

nsw_lambs %>%
  model(drift = RW(Count ~ drift())) %>% 
  forecast(h = 30) %>% 
  autoplot(nsw_lambs)

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

google_stock <- 
  gafa_stock %>% 
  filter(Symbol == 'GOOG')

google_stock %>% autoplot(Close)

Produce forecasts using the drift method and plot them.

google_forecast <-
  google_stock %>% 
  filter_index('2018' ~ .) %>% 
  fill_gaps() %>% 
  model(drift = RW(Close ~ drift())) %>% 
  forecast(h = 100)

google_forecast %>% autoplot(filter_index(google_stock, '2018' ~ .))

Show that the forecasts are identical to extending the line drawn between the first and last observations.

google_forecast %>% 
  autoplot(filter_index(google_stock, '2018' ~ .)) +
  geom_segment(
    aes(
      x = Date[1],
      y = Close[1],
      xend = Date[length(Date)],
      yend = Close[length(Close)]
    ),
    color = 'red',
    linetype = 'dashed'
  )

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

google_stock %>%
  filter_index('2016' ~ .) %>% 
  fill_gaps() %>% 
  model(
    Mean = MEAN(Close),
    Naive = NAIVE(Close),
    `Seasonal Naive` = SNAIVE(Close),
    Drift = RW(Close ~ drift())
  ) %>% 
  forecast(h = 400) %>% 
  autoplot(filter_index(google_stock, '2016' ~ .), level = NULL)

## Warning: Removed 2 row(s) containing missing values (geom_path).

Which one is going to be best? Stock price is such a random process that it would be hard to even guess.

Produce forecasts for all of the Victorian series in aus_livestock using SNAIVE(). Plot the resulting forecasts including the historical data. Is this a reasonable benchmark for these series?

vic_livestock <-
  aus_livestock %>%
  filter(Animal == 'Calves' | Animal == 'Lambs') %>% 
  filter(State == 'Victoria')

vic_livestock %>% 
  model(`Seasonal Naive` = SNAIVE(Count)) %>% 
  forecast(h = 50) %>% 
  autoplot(vic_livestock)

From a purely visual perspective, the seasonal naive does look like a reasonable benchmark for these series.

Calculate the residuals from a seasonal naïve forecast applied to the quarterly Australian beer production data from 1992.

post_1992_beer <- 
  aus_production %>% 
  filter_index('1992' ~ .)

beer_fit <-
  post_1992_beer %>% 
  model(snaive = SNAIVE(Beer))
  
beer_fit %>% gg_tsresiduals()

## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing non-finite values (stat_bin).

beer_fit %>% 
  forecast(h = 20) %>%
  autoplot(post_1992_beer)

We can conclude that the beer production is very seasonal, with a lag of 4 quarters (1 year).

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

# Looking and the series
aus_economy <-
  global_economy %>% 
  filter(Code == 'AUS')

aus_economy %>%
  autoplot(Exports)

# There doesn't appear to be seasonality, only trend and cycle.
# Naive is more appropritate in this case
aus_export_model <-
  aus_economy %>% 
  model(naive = NAIVE(Exports))

aus_export_model %>% gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

aus_export_model %>% 
  forecast(h = 10) %>% 
  autoplot(aus_economy)

Are the following statements true or false? Explain your answer.

Good forecast methods should have normally distributed residuals.
- A good forecast doesn’t necessarily need normally distributed residuals, however they are needed in order to get good prediction intervals.
A model with small residuals will give good forecasts.
- A model with small residuals means it has been fit well, however it doesn’t necessarily mean it will give good forecasts.
The best measure of forecast accuracy is MAPE.
- It recommended (Hyndman and Koehler) that MAPE not be used.
If your model doesn’t forecast well, you should make it more complicated.
- Making a model more complicated generally means making it more flexible. You may be able to get a better fit to the training data, but these high variance, low bias models will likely perform worse on a test set.
Always choose the model with the best forecast accuracy as measured on the test set.
- If you want to use the model for forecasting, then this is correct.

For your retail time series (from Exercise 6 in Section 2.10):

Create a training dataset consisting of observations before 2011:

aus_retail_hospitality <- aus_retail %>% 
  filter(`Series ID` == 'A3349640L')

aus_retail_hospitality_train <-
  aus_retail_hospitality %>% 
  filter_index(. ~ '2011')

Check that your data have been split appropriately by producing the following plot.

aus_retail_hospitality %>% 
  autoplot(Turnover) + 
  autolayer(aus_retail_hospitality_train, Turnover, colour = 'blue')

Calculate seasonal naïve forecasts using SNAIVE() applied to your training data

aus_retail_model <-
  aus_retail_hospitality_train %>% 
  model(`seasonal naive` = SNAIVE(Turnover))

aus_retail_forecast <-
  aus_retail_model %>% 
  forecast()

Compare the accuracy of your forecasts against the actual values.

aus_retail_model %>% accuracy()

## # A tibble: 1 x 11
##   State  Industry    .model .type    ME  RMSE   MAE   MPE  MAPE  MASE  ACF1
##   <chr>  <chr>       <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Victo… Cafes, res… seaso… Trai…  13.1  29.8  20.6  7.46  11.5     1 0.820

aus_retail_forecast %>% accuracy(aus_retail_hospitality)

## # A tibble: 1 x 11
##   .model  State  Industry   .type    ME  RMSE   MAE   MPE  MAPE  MASE  ACF1
##   <chr>   <chr>  <chr>      <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 season… Victo… Cafes, re… Test  -28.9  37.1  30.3 -7.79  8.16  1.47 0.346

Check the residuals.

aus_retail_model %>% gg_tsresiduals()

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing non-finite values (stat_bin).

The residuals are correlated, with a 24 month seasonal pattern. The residuals look resonably normal, with the exception that there seems to be a bias towards positive residuals.

How sensitive are the accuracy measures to the amount of training data used?

snaive_accuracy <- function(var, train, test) {
  train %>% 
  model(SNAIVE({{ var }})) %>% 
  forecast() %>% 
  accuracy(test) %>% 
  pull(RMSE)
}

years <- c(1987:2011)

years %>%
  as.character() %>% 
  map(~filter_index(aus_retail_hospitality, . ~ .x)) %>% 
  map_dbl(~snaive_accuracy(Turnover, .x, aus_retail_hospitality)) %>% {
  tibble(
    rmse = .,
    year_range = years
  )
} %>% 
  ggplot(aes(years, rmse)) +
  geom_line() +
  geom_point() + 
  labs(
    x = 'Year Range: 1982 - x',
    y = 'RMSE of SNAIVE Model',
    title = 'RMSE of Test Error'
  )

We can see that the RMSE is stable until post-1996. When including data from 1982 until here, the RMSE starts to increase.

Consider the number of pigs slaughtered in New South Wales (data set aus_livestock).

Produce some plots of the data in order to become familiar with it.

au_pigs <-
  aus_livestock %>% 
  filter(Animal == 'Pigs' & State == 'New South Wales')

au_pigs %>% gg_tsdisplay(Count)

Create a training set of 486 observations, witholding a test set of 72 observations (6 years).

au_pigs_train <- au_pigs %>% 
  mutate(id = 1:n()) %>% 
  filter(id <= 486) %>% 
  select(-id)

au_pigs_test <- au_pigs %>% 
  mutate(id = 1:n()) %>% 
  filter(id > 486) %>% 
  select(-id)

Try using various benchmark methods to forecast the training set and compare the results on the test set. Which method did best?

au_pigs_models <-
  au_pigs_train %>% 
  model(
    mean = MEAN(Count),
    naive = NAIVE(Count),
    snaive = SNAIVE(Count),
    drift = RW(Count ~ drift())
  )

au_pigs_models %>% 
  forecast(au_pigs_test) %>% 
  accuracy(au_pigs_test)

## # A tibble: 4 x 11
##   .model Animal State .type      ME   RMSE    MAE    MPE  MAPE  MASE
##   <chr>  <fct>  <fct> <chr>   <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl>
## 1 drift  Pigs   New … Test   -4685.  8091.  6967.  -7.36  10.1   NaN
## 2 mean   Pigs   New … Test  -39360. 39894. 39360. -55.9   55.9   NaN
## 3 naive  Pigs   New … Test   -6138.  8941.  7840.  -9.39  11.4   NaN
## 4 snaive Pigs   New … Test   -5838. 10111.  8174.  -8.81  11.9   NaN
## # … with 1 more variable: ACF1 <dbl>

The seasonal naive method appears to have the best result on the test data.

Check the residuals of your preferred method. Do they resemble white noise?

We recall that the residuals are the terms from the training set, not the test set. Those would be the error terms.

au_pigs_models %>%
  select(snaive) %>% 
  gg_tsresiduals()

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing non-finite values (stat_bin).

The residuals do not appear like white noise, there is some correlation within the first 6 months.

Create a training set for household wealth (hh_budget) by witholding the last four years as a test set.

library(tsibbledata)

hh_budget_train <-
  hh_budget %>%
  filter_index(. ~ '2012')

hh_budget_test <-
  hh_budget %>% 
  filter_index('2013' ~ .)

Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set.

hh_budget_models <-
  hh_budget_train %>% 
  model(
    mean = MEAN(Wealth),
    naive = NAIVE(Wealth),
    snaive = SNAIVE(Wealth),
    drift = RW(Wealth ~ drift())
  )

## Warning: 4 errors (1 unique) encountered for snaive
## [4] Non-seasonal model specification provided, use RW() or provide a different lag specification.

hh_budget_forecast <-
  hh_budget_models %>% 
  forecast(hh_budget_test)

Compute the accuracy of your forecasts. Which method does best?

hh_budget_forecast %>% accuracy(hh_budget_test) %>%
  group_by(Country) %>% 
  arrange(RMSE) %>% 
  slice(1)

## # A tibble: 4 x 10
## # Groups:   Country [4]
##   .model Country   .type    ME  RMSE   MAE   MPE  MAPE  MASE   ACF1
##   <chr>  <chr>     <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 drift  Australia Test   29.1  35.5  29.1  7.23  7.23   NaN  0.210
## 2 drift  Canada    Test   33.3  37.2  33.3  6.09  6.09   NaN -0.229
## 3 drift  Japan     Test   14.7  17.9  14.7  2.44  2.44   NaN -0.229
## 4 drift  USA       Test   75.9  76.2  75.9 12.7  12.7    NaN -0.561

The drift model had the best result (measured using RMSE) against 3 out of the 4 countries in the series. The exception was the USA series where the naive method returned the best RMSE.

Do the residuals from the best method resemble white noise?

hh_budget_models %>% 
  slice(1) %>% 
  select(naive) %>% 
  gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

hh_budget_models %>% 
  slice(2) %>% 
  select(naive) %>% 
  gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

The residuals do appear as white noise.

Create a training set for Australian takeaway food turnover (aus_retail) by witholding the last four years as a test set.

aus_takeaway_turnover <-
  aus_retail %>%
  filter(Industry == 'Takeaway food services')

aus_taway_train <-
  aus_takeaway_turnover %>% 
  filter_index(. ~ '1994')
  
aus_taway_test <-
  aus_takeaway_turnover %>% 
  filter_index('1995' ~ .)

Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set.

aus_taway_models <-
  aus_taway_train %>% 
  model(
    mean = MEAN(Turnover),
    naive = NAIVE(Turnover),
    snaive = SNAIVE(Turnover),
    drift = RW(Turnover ~ drift())
  )

aus_taway_forecast <-
  aus_taway_models %>% 
  forecast(aus_taway_test)

Compute the accuracy of your forecasts. Which method does best?

aus_taway_forecast %>% 
  accuracy(aus_taway_test) %>% 
  arrange(RMSE) %>% 
  group_by(State) %>%
  slice(1)

## # A tibble: 8 x 11
## # Groups:   State [8]
##   .model State  Industry .type     ME  RMSE   MAE     MPE  MAPE  MASE  ACF1
##   <chr>  <chr>  <chr>    <chr>  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 drift  Austr… Takeawa… Test   0.496  2.69  2.00  -1.05  13.0    NaN 0.901
## 2 drift  New S… Takeawa… Test  42.8   87.7  60.8    6.68  16.9    NaN 0.934
## 3 snaive North… Takeawa… Test   7.89   9.87  7.89  54.3   54.3    NaN 0.966
## 4 drift  Queen… Takeawa… Test  24.2   35.0  26.5    9.48  11.0    NaN 0.804
## 5 drift  South… Takeawa… Test   2.81   8.29  5.83   1.84   9.14   NaN 0.807
## 6 drift  Tasma… Takeawa… Test  -2.48   3.54  2.94 -17.7   19.6    NaN 0.818
## 7 drift  Victo… Takeawa… Test  39.2   63.1  47.0   11.9   18.4    NaN 0.938
## 8 drift  Weste… Takeawa… Test   6.22  19.0  15.6    0.159 15.1    NaN 0.906

The drift appears to have performed the best on most all states except for the Northern territory.

Do the residuals from the best method resemble white noise?

aus_taway_models %>% 
  select(drift) %>% 
  slice(1) %>% 
  gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

aus_taway_models %>% 
  select(drift) %>% 
  slice(7) %>% 
  gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

The residuals don’t appear as white noise, there is still some 6 month seasonality that hasn’t been captured by the model.

Chapter 3 - The Forecaster’s Toolbox

Greg Foletta

2020-07-08

Simple Forecasting Methods

Average Method

Naive Method

Seasonal Naive Method

Drift Method

Examples

Fitted Values and Residuals

Residuals

Residual Diagnostics

Example

Portmanteau Test for Autocorrelation

Forecast Distirbutions

Prediction Intervals

One-Step Prediction Intervals

Multi-Step Prediction Intervals

Benchmark Methods

Prediction Intervals from Bootstrapped Residuals

Forecasting Using Transformations

Prediction Intervals with Transformations

Forecasting with Constraints.

Bias Adjustments

Forecasting with Decomposition

Example

Evaluating Forecast Accuracy

Training and Test Sets

Function to Subset Time Series

Forecast Errors

Scale Dependence

Percentage Errors

Scaled Errors

Examples

Time Series Cross-Validation

Example

Exercises