library(tidyverse)
library(forecast)
library(feasts)
library(fma)
library(magrittr)
library(fpp3)
library(tsibble)
library(lubridate)

Time series data can exhibit a variety of patterns, and it is often useful to split a time series into its components.

Transformations and Adjustments

Adjusting the historical data can often lead to a simpler time series. The adjustments either remove sources of variation or make the pattern more consistent across the whole data set.

Calendar Adjustments

Some of the variation in seasonal data may be due to calendar effect. For example when looking at monthly sales there will be variation between the months because of the differing number of trading days.

It is easy to remove this variation by dividing the sales total by the number of trading days.

Population Adjustments

Any data that are affected by population changes can be adjusted to give per-captia data.

global_economy %>% 
    filter(Country == 'Australia') %>% 
    autoplot(GDP / Population) +
    labs(
        x = 'Year',
        y = 'GDP per Capita',
        title = 'Australian GDP per Capita'
    )

Inflation

Data which are affected by the value of money are best adjusted before modelling. To make these adjustments a price index $z_t$ is used. If $y_t$ is, for example, a house price in year $t$, then $x_t = y_t / z_t * z_{2000}$ is the adjusted price at year 2000 dollar levels.

print_retail <- aus_retail %>% 
    filter(Industry == 'Newspaper and book retailing') %>% 
    group_by(Industry) %>% 
    index_by(Year = year(Month)) %>% 
    summarise(Turnover = sum(Turnover))

aus_economy <- global_economy %>% 
    filter(Code == 'AUS')

print_retail %>% 
    left_join(aus_economy, by = 'Year') %>% 
    mutate(Adjusted_turnover = Turnover / CPI) %>% 
    gather('Type', 'Turnover', Turnover, Adjusted_turnover, factor_key = TRUE) %>% 
    ggplot(aes(Year, Turnover)) +
    geom_line() +
    facet_grid(vars(Type), scales = 'free') +
    labs(
        x = 'Year',
        y = 'Turnover (raw and adjusted)',
        title = 'Turnover for the Australian Print Media Industry'
    )

## Warning: Removed 1 row(s) containing missing values (geom_path).

By adjusting for inflation using CPI we can see that Australia’s newspaper industry has been in decline for much longer than the original data suggests.

Mathematical Transformations

If the data show variation that increases or decreases with the level of the series, a transformation can be useful. For example a log transformation $w_t = log(y_t)$ is often useful.

Logarithms are useful because they are interpretable: changes in a log value a relative (or percentage) changes on the original scale. If base 10 is used, an increase of 1 on the log scale is a multiplication of 10 on the original scale. They also constrain forecasts to be positive on the orignal scale.

Power transformations of the form $w_t = y_t^p$ can be used.

A useful family of transformations are the Box-Cox transformations. These depend on a parameter $\lambda$:

\[ w_t = \begin{cases} log(y_t), & \text{if } \lambda = 0; \\ (y_t^{\lambda} - 1)/\lambda & \text{otherwise}. \end{cases} \]

This is always done with a natural logarithm. For $\lambda = 0$ the logarithm is used, and for $\lambda = 1$ the values are shifted down by one. But for all other values of $\lambda$ the time series will change shape.

A good value for $\lambda$ is one that makes the size of the seasonal variation the same across the whole series.

The guerrero feature can be used to choose a lambda for you.

lambda <- aus_production %>% 
    features(Gas, features = guerrero) %>% 
    pull(lambda_guerrero)

aus_production %>%
    mutate('Gas Box-Cox Adjusted' = box_cox(Gas, lambda)) %>% 
    gather('Type', 'Value', Gas, `Gas Box-Cox Adjusted`) %>% 
    ggplot(aes(Quarter, Value)) +
    geom_line() +
    facet_grid(vars(Type), scales = 'free') +
    labs(
        x = 'Quarter',
        y = 'Gas Production (petajoules)',
        title = 'Australian Gas Production (Box-Cox adjusted)'
    )

Time Series Components

If we assume an additive decomposition, then we can write:

\[ y_t = S_t + T_t + R_t \]

Where we have the Seasonal, Trend and Remainder components. Alternatively a multiplicative decomposition would be:

\[ y_t = S_t \times T_t \times R_t \]

Additative is most appropriate if the magnitude of the seasonal fluctuations, or the variation around the trend cycle, doesn’t vary with the level of the time series. Multiplicative decompositions are approprite when they do. Multiplicative is most common with economic time series.

An alternative to multiplicative decomposition is to transform the data until the variation appears to be stable over time:

\[ y_t = S_t \times T_t \times R_t \text{, then} \\ log(y_t) = log(S_t) + log(T_t) + log(R_t) \]

Employment in the US Reatil Sector

Let’s decompose the number of people employed in the US retail sector.

us_retail_employment <- us_employment %>% 
    filter(year(Month) >= 1990 & Title == 'Retail Trade') %>%
    select(-Series_ID)

us_retail_employment %>% 
    autoplot(Employed) +
    labs(
        x = 'Year',
        y = 'Persons (thousands)',
        title = 'Total Employment in US Retail Sector'
    )

We can use an STL decomposition:

us_retail_employment %>%
    model(STL(Employed)) %>%
    components()

## # A dable:           357 x 7 [1M]
## # Key:               .model [1]
## # STL Decomposition: Employed = trend + season_year + remainder
##    .model         Month Employed  trend season_year remainder season_adjust
##    <chr>          <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
##  1 STL(Employ… 1990 Jan   13256. 13291.       -38.1    3.08          13294.
##  2 STL(Employ… 1990 Feb   12966. 13272.      -261.   -44.2           13227.
##  3 STL(Employ… 1990 Mar   12938. 13252.      -291.   -23.0           13229.
##  4 STL(Employ… 1990 Apr   13012. 13233.      -221.     0.0892        13233.
##  5 STL(Employ… 1990 May   13108. 13213.      -115.     9.98          13223.
##  6 STL(Employ… 1990 Jun   13183. 13193.       -25.6   15.7           13208.
##  7 STL(Employ… 1990 Jul   13170. 13173.       -24.4   22.0           13194.
##  8 STL(Employ… 1990 Aug   13160. 13152.       -11.8   19.5           13171.
##  9 STL(Employ… 1990 Sep   13113. 13131.       -43.4   25.7           13157.
## 10 STL(Employ… 1990 Oct   13185. 13110.        62.5   12.3           13123.
## # … with 347 more rows

us_retail_employment %>%
    model(STL(Employed)) %>%
    components() %>% 
    ggplot() +
    geom_line(aes(Month, trend), colour = 'red') +
    geom_line(aes(Month, Employed), colour = 'grey') +
    labs(
        x = 'Month',
        y = 'Person (thousands)',
        title = 'Total Employment in US Retail (actual + STL)'
    )

All of the components can be plotted at once. The grey bars show the relative scales of each component.

us_retail_employment %>% 
    model(STL(Employed)) %>% 
    components() %>% 
    autoplot()

Seasonally Adjusted Data

If the seasonal component is removed from the original data, the resulting are the ‘seasonally adjusted’ data. For an additive decomposition, the seasonally adjusted data are $y_t - S_t$ and for multiplicative it is $y_t/S_t$.

us_retail_employment %>% 
    model(STL(Employed)) %>% 
    components() %>% 
    ggplot() +
    geom_line(aes(Month, Employed), colour = 'grey') +
    geom_line(aes(Month, season_adjust), colour = 'blue') +
    labs(
        x = 'Month',
        y = 'Persons (thousands)',
        title = 'Total Employment in US (seasonally adjusted)'
    )

Seasonally adjusted contain the remainder component as well as the trend-cycle.

Moving Averages

The classical method of time series decomposition is moving averages. The first step in a classical decomposition is to use moving average to estimate the trend cycle.

Moving Average Smoothing

A moving average of order $m$ can be written as:

\[ \hat{T}_t = \frac{1}{m} \sum_{j = -k}^k y_{t+j} \]

Where $m = 2k + 1$. So the estimate of the trend-cycle at time $t$ is obtained by averaging values of the time series withing $k$ period of $t$.

global_economy %>% 
    filter(Country == 'Australia') %>% 
    mutate(MA_5 = slide_dbl(Exports, mean, .size = 5, .align = 'center')) %>% 
    ggplot() +
    geom_line(aes(Year, Exports), colour = 'grey') +
    geom_line(aes(Year, MA_5), colour = 'blue') +
    labs(
        x = 'Year',
        y = 'Exports (% GDP)',
        title = 'Australian Exports (Acutal  + Moving Average)'
    )

## Warning: `slide()` is deprecated as of tsibble 0.9.0.
## Please use `slider::slide()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

## Warning: Removed 4 row(s) containing missing values (geom_path).

The order of the moving average determines the smoothness of the trend-cycle estimate. The larger the order, the smoother the estimate. The orders are usually odd so that the window ($2k + 1$) is symmetric.

Moving Averages of Moving Averages

It’s possible to apply a moving average to a moving average. One reason is to make an even-order average symmetric. The notation $2 \times 4\text{-MA}$ means a $\text{4-MA}$ followed by a $\text{2-MA}$. Whena $\text{2-MA}$ follows an even order MA, this is then a ‘centred moving average or order $m$, where $m$ is even’. An even should be followed by an even, and an odd followed by an odd to make them symmetric.

Estimating the Trend-Cycle

Consider a $2 \times \text{4-MA}$ on quarterly data:

\[ \hat{T}_t = \frac{1}{8}y_{t-2} + \frac{1}{4}y_{t-1} + \frac{1}{4}y_{t} + \frac{1}{4}y_{t-1} + \frac{1}{8}y_{t-2} \]

Each quarter of the year is given equal weight as then first and last terms apply to the same quarter in consecutive years. The seasonal variation will be averaged out

In general a $2 \times m\text{-MA}$ is equivalent to a weighted moving average of order $m + 1$ where:

All observations take weight $1/m$
Except for the first and last, which take $1/(2m)$

So if the seasonal period is even and order $m$, we use a $2\times m\text{-MA}$ to estimate the trend cycle.

us_retail_employment %>% 
    mutate(
        `12-MA` = slide_dbl(Employed, mean, .size = 12, align = 'cr'),
        `2x12-MA` = slide_dbl(`12-MA`, mean, .size = 2, align = 'cl')
    ) %>% 
    ggplot() +
    geom_line(aes(Month, Employed), colour = 'grey') +
    geom_line(aes(Month, `2x12-MA`), colour = 'blue')

## Warning: Removed 12 row(s) containing missing values (geom_path).

Any other order choice would have resulted in the graph showing seasonality.

Weighted Moving Averages

Combinations of moving averages result in weighted moving averages. The $2 \times 4\text{-MA}$ is equivlant to a $5\text{-MA}$ with weights given by $\big[ \frac{1}{8}, \frac{1}{4}, \frac{1}{4}, \frac{1}{4}, \frac{1}{8} \big]$

In general a weighted $m\text{-MA}$ can be written as:

\[ \hat{T}_t = \sum_{j = -k}^k \alpha_j y_{t+j} \]

where $k = (m - 1)/2$ and the weights given by $\big[ \alpha_{-k}, \ldots, \alpha_k \big]$.

It’s important that the weights all sum to one and are symmetric. The simple $m\text{-MA}$ is a special case where all of the weights are equal to $1/m$.

A major advantage of the weighted moving averages is they yield a smoother curve. Obervations don’t enter and exit at full weight, rather they slowly increase and decrease.

Classical Decomposition

This originated in the 1920s and is forms the starting point for most other forms. There are two forms: additive and multiplicative.

In classical we assume that the seasonal component is constant from year to year. For multiplicative seasonality the $m$ values are sometimes called the ‘seasonal indicies’.

Additative

Step 1

If $m$ is an even number, compute the trend-cycle component $\hat{T}_t$ using a $2 \times m\text{-MA}$. If it’s odd compute using an $m\text{-MA}$.

Step 2

Calculate the detrended series $y_t - \hat{T}_t$

Step 3

To estimate the seasonal component for each season, average the detrended values for that season. For example with monthly data, the seasonal component for March is the average of all the detrended March values in the data.

These seasonal component values are then adjusted to ensure that they add to zero. The seasonal component is then obtained by stringing together these monthly values. This gives $\hat{S}_t$.

Step 4

The remainder component is calculated using $_t = y_t = _t - _t

us_retail_employment %>% 
  model(classical_decomposition(Employed, type = 'additive')) %>% 
  components() %>% 
  autoplot() +
  labs(title = 'US Retail Employment - Classical Decomposition')

## Warning: Removed 6 row(s) containing missing values (geom_path).

Multiplicative

This is the same, except the detrended series in Step 2 is $y_t/\hat{T}_t$, and the remainder component in Step 4 is $\hat{R}_t = y_t/(\hat{T}_t\hat{S}_t)$.

Comments

While still widely used, classical decomp. is not recommended. Some of the issues:

The estimate of the trend cycle is unavailable for the first few and last few observations.
It over-smoothes rapid rises and falls in the data.
Assumes the seasonal component repeats from year to year. For some data this is a reasonable assumption, but for longer series it’s not.
Occasionally the values of time series in a small number of periods may be unusual - for example an industrial dispute affecting the number of passengers on an airline. The classical method is not robust to these kinds of values.

X11 Decomposition

X11 is based on classical, but has extra steps in order to overcome some of the drawbacks.

Estimates are available for endpoint observations.
Seasonal component is allowed to vary slowly over time.

X11 can be summarised as such:

Initial estimate of the trend with a $2 \times 12\text{-MA}$.
Removal of this trend from the original series to give an estimate of the seasonal and remainder components $S_t I_t$
A preliminary estimate of the seasonal component is found by applying a weighted 6 term moving average ($S_{3\times 3}$) to $S_t I_t$.
A preliminary estimate of the adjusted data by dividing the seasonal estimate in the previous step into the original series.
A 9,13 or 23 term ‘Henderson’ moving average is applied to the seasonally adjusted values, depending on the volatility of the series (more volatile requies a longer moving average) to produce an improved trend estimate. This is divided into the original series to give a second seasonal and remainder component estimate.
Step 2 is repeated to obtain a final seasonal estimate
A final seasonally adjusted series is found by dividing the second estimate of the seasonal from the previous step into the original series.
Another Henderson moving average is applied to the final estimate of the seasonally adjusted series.
The irregulars are estimated by dividing the trend estiamtes into the seasonally adjusted data $\frac{T_t \times I_t}{\hat{T}_t} \approx I_t$

It can be useful to use subseries plots against the seasonal component. These help us to visualise the variation in the seasonal component over time

us_retail_employment %>% 
  model(STL(Employed)) %>% 
  components() %>% 
  gg_subseries(season_year) +
  labs(
    x = 'Years',
    y = 'Persons (thousands)',
    title = 'US Retail Employment - STL - Subseries'
  )

SEATS Decomposition

“SEATS” stands for “Seasonal Extraction in ARIMA Time Series”. The works only with quarterly and monthly data.

STL Decomposition

STL is a versatile and robust method for decomposing time series, standing for ‘Seasonal and Trend decompositon using Loess’.

Advantages are:

Unlike SEATS and X11, it handles any type of seasonality, not only monthly and quarterly data.
The seasonal component is allowed to change over time, and the rate of change can be controlled by the user.
The smoothness of the trend-cycle can also be controlled by the user.
It can be robust to outliers, so the occasional unual observations will not affect the estimate of the trend-cycle.

Disadvantages:

Does not handle trading day or calendar variation automatically.
Only provides facilities for additive decompositions.

Mutiplicative decomposition can be achieved by taking the log of the data, then back-tranforming the components. Decompositions between additive and multiplicative can be obtained using a Box-Cox transformation of the data with $0 \lt \lambda \lt 1$. $\lambda = 0$ is multiplicative while $\lambda = 1$ is additive.

Examples

us_retail_employment %>% 
  model(STL(Employed ~ trend(window = 7) + season(window = 'periodic'), robust = TRUE)) %>% 
  components() %>% 
  autoplot() +
  labs(
    x = 'Month',
    y = 'Persons (thousands)',
    title = 'US Retail Employment - STL Decomposition'
  )

The two main parameters are the trend-cycle window and the seasonal window. These control how rapidly those components can change. Both should be odd numbers.

Trend window is the number of consecutive observations to be used when estimating the trend-cycle.
Season window is the number of consecutive years to be used in estimating each value in the seasonal component.

Setting the seasonal window to be infinite is equivalent to forcing the seasonal component to be periodic (season(window = 'periodic')).

Bu default STL() uses a season window of 13 and a trend window of 21. However this may need to be changed. If we apply this to the retail data we see the GFC leak into the remainder component:

us_retail_employment %>% 
  model(
    `STL Window 7` = STL(Employed ~ trend(window = 7) + season(window = 'periodic'), robust = TRUE),
    `STL Default` = STL(Employed)
  ) %>% 
  components() %>%
  filter(year(Month) > 2007 & year(Month) < 2012) %>% 
  ggplot() +
  geom_line(aes(Month, remainder, colour = .model)) +
  labs(
    x = 'Month',
    y = 'Persons (thousands)',
    title = 'US Retail - STL Decomposition',
    subtitle = 'STL Default and Trend Window 7 - Remainder Components'
  )

Exercises

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy %>% 
  mutate(GDPpC = GDP / Population) %>%
  add_count(Country, wt = GDPpC) %>%
  drop_na(GDPpC) %>% 
  mutate(Rank = dense_rank(n)) %>% 
  filter(Rank <= 10) %>% 
  ggplot() +
  geom_line(aes(Year, GDPpC, colour = as.factor(Country))) +
  labs(
    x = 'Year',
    y = 'GDP per Capita',
    title = 'Sum of GDP per Capita - Top 5 Countries'
  )

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy

It’s most appropriate to transform this into GDP per capita

global_economy %>% 
  filter(Country == 'United States') %>% 
  mutate(GDPpC = GDP / Population) %>% 
  ggplot() +
  geom_line(aes(Year, GDPpC)) +
  labs(
    x = 'Year',
    y = 'GDP per Capita',
    title = 'United States - GDP per Capita'
  )

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock

aus_livestock %>% 
  filter(
    Animal == 'Bulls, bullocks and steers' & State == 'Victoria') %>%  
  ggplot() +
  geom_line(aes(Month, Count))

Victorian Electricity Demand from vic_elec

We’ll normalise by temperature. The summertime spikes appear to disappear, however we do get some outliers in the winter months. The overall variation appears to have reduced as well.

vic_elec %>% 
  mutate(`Demand/Temp` = Demand / Temperature) %>% 
  pivot_longer(c(Demand, `Demand/Temp`), names_to = 'Measure', values_to = 'Demand') %>% 
  ggplot() +
  geom_line(aes(Time, Demand)) +
  facet_grid(rows = vars(Measure), scales = 'free') +
  labs(
    x = 'Time (30 mins)',
    y = 'Demand (MW and MW/degree celcius)',
    title = 'Victorian Electricity Demand',
    subtitle = 'Raw and Temperature Normalised'
  )

Gas production from aus_production

We’ll use a Box-Cox transformation.

lambda <- aus_production %>% 
    features(Gas, features = guerrero) %>% 
    pull(lambda_guerrero)

aus_production %>%
    mutate('Gas Box-Cox Adjusted' = box_cox(Gas, lambda)) %>% 
    gather('Type', 'Value', Gas, `Gas Box-Cox Adjusted`) %>% 
    ggplot(aes(Quarter, Value)) +
    geom_line() +
    facet_grid(vars(Type), scales = 'free') +
    labs(
        x = 'Quarter',
        y = 'Gas Production (petajoules)',
        title = 'Australian Gas Production (Box-Cox adjusted)'
    )

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Box-Cox is used when the variation of a time series increases or decreases with time. By aplying the transformation, this variation can be reduced.

Let’s take a look at the Canadian gas data and apply a Box-Cox:

lambda <- canadian_gas %>% 
    features(Volume, features = guerrero) %>% 
    pull(lambda_guerrero)

canadian_gas %>% 
    mutate('Gas Box-Cox Adjusted' = box_cox(Volume, lambda)) %>% 
    gather('Type', 'Value', Volume, `Gas Box-Cox Adjusted`) %>% 
    ggplot(aes(Month, Value)) +
    geom_line() +
    facet_grid(vars(Type), scales = 'free') +
    labs(
        x = 'Quarter',
        y = 'Gas Production (petajoules)',
        title = 'Australian Gas Production (Box-Cox adjusted)'
    )

We see that the variation incrases, but then decreases with time. This means a Box-Cox is not going to help.

What Box-Cox transformation would you select for your retail data (from Exercise 6 in Section 2.10)?

The data is aus_retail, the Box-Cox reduces a small amount of the increasing variation in the time series.

retail_subset <-
  aus_retail %>% 
  filter(`Series ID` == 'A3349849A')

lambda <- retail_subset %>% 
  features(Turnover, features = guerrero) %>% 
  pull(lambda_guerrero)

retail_subset %>% 
  mutate(`Turnover Box-Cox` = box_cox(Turnover, lambda)) %>% 
  pivot_longer(c(Turnover, `Turnover Box-Cox`), names_to = 'Transform', values_to = 'Turnover') %>% 
  ggplot() +
  geom_line(aes(Month, Turnover)) +
  facet_grid(rows = vars(Transform), scales = 'free')

Show that a 3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

With a single $5\text{-MA}$, $5 = 2k + 1$, thus $k = 2$, so our $t$ indicies go from $[-2, 2]$.

\[ \hat{T}_{t(5)} = \frac{1}{5}( y_{t-2} + y_{t-1} + y_{t} + y_{t+1} + y_{t+2} ) \]

If we then apply a $3\text{-MA}$ to this, our $k = 1$, with our indicies going from [-1,1]. This means we’ll have:

\[ \hat{T}_t = \frac{1}{3}\big[ \\ \frac{1}{5}( y_{t-3} + y_{t-2} + y_{t-1} + y_{t} + y_{t+1} ) + \\ \frac{1}{5}( y_{t-2} + y_{t-1} + y_{t} + y_{t+1} + y_{t+2} ) + \\ \frac{1}{5}( y_{t-1} + y_{t} + y_{t+1} + y_{t+2} + y_{t+3} ) + \\ \big] \]

Summing up the indicies, there’s $1 \times y_{t-3}$, $2 \times y_{t-2}$, etc.

c(
  1/3 *  1/5,
  2 * (1/3) * (1/5),
  3 * (1/3) * (1/5),
  3 * (1/3) * (1/5),
  3 * (1/3) * (1/5),
  2 * (1/3) * (1/5),
  1/3 *  1/5
)

## [1] 0.06666667 0.13333333 0.20000000 0.20000000 0.20000000 0.13333333
## [7] 0.06666667

This is the same as a weighted moving average with these coefficients.

he fma::plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

prod_a_sales <-
  fma::plastics %>% 
  as_tsibble() %>%  
  rename(Month = index, Sales = value)
    
prod_a_sales %>% 
  ggplot() +
  geom_line(aes(Month, Sales)) +
  labs(
    title = 'Plastics Sales - Product A'
  )

From this we can see a generate trend upwards (no cycles), with a yearly seasonal component.

Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

prod_a_sales %>% 
  model(classical_decomposition(Sales, type = 'multiplicative')) %>% 
  components() %>%
  autoplot()

## Warning: Removed 6 row(s) containing missing values (geom_path).

Do the results upport the graphical interpretation?

Yes the results do support it.

Plot the seasonally adjusted data

We’ll use an STL.

prod_a_sales %>% 
  model(STL(Sales)) %>% 
  components() %>% 
  ggplot() +
  geom_line(aes(Month, season_adjust)) +
  labs(
    y = 'Sales',
    title = 'Seasonally Adjusted Sales'
  )

Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

prod_a_sales_outlier <- prod_a_sales
prod_a_sales_outlier[3,2] <- prod_a_sales_outlier[3,2] + 500

prod_a_sales_outlier %>%
  model(STL(Sales)) %>% 
  components() %>% 
  ggplot() +
  geom_line(aes(Month, season_adjust)) +
  labs(
    y = 'Sales',
    title = 'Seasonally Adjusted Sales',
    subtitle = 'Early Outlier'
  )

prod_a_sales_outlier[3,2] <- prod_a_sales_outlier[3,2] - 500
prod_a_sales_outlier[50,2] <- prod_a_sales_outlier[50,2] + 500

prod_a_sales_outlier %>%
  model(STL(Sales)) %>% 
  components() %>% 
  ggplot() +
  geom_line(aes(Month, season_adjust)) +
  labs(
    y = 'Sales',
    title = 'Seasonally Adjusted Sales',
    subtitle = 'Late Outlier'
  )

With STL the outlier has a large effect on the seasonally adjusted sales. Appears to have a lower effect on the adjusted adjusted sales later.

Figures 3.16 and 3.17 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

aus_labour_comp <-
  labour %>% 
  as_tsibble() %>% 
  model(STL(value)) %>% 
  components()

aus_labour_comp %>% 
  autoplot() +
  labs(
    x = 'Year',
    y = 'Persons (thousands)',
    title = 'STL Decomposition - Australian Labour Force'
  )

aus_labour_comp %>% 
  gg_subseries(season_year)

Write about 3–5 sentences describing the results of the decomposition.

We first take a look at the seasonal component of labour. THere is a large rise just before Christmas, with a drop off over the summar holidays. This picks back up again, slowly diminishing into Winter.

We don’t see any cycles over the long term, but we do see a flattening around 1991/1992. This lines up with the recession that ocurred then.

This exercise uses the canadian_gas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).

Plot the data using autoplot(), gg_subseries() and gg_season() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

canadian_gas %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = Volume`

canadian_gas %>%
  gg_subseries()

## Plot variable not specified, automatically selected `y = Volume`

canadian_gas %>% 
  gg_season()

## Plot variable not specified, automatically selected `y = Volume`

Do an STL decomposition of the data. You will need to choose a seasonal window to allow for the changing shape of the seasonal component.

canadian_gas %>% 
  model(
    Periodic = STL(Volume ~ season(window = 'periodic')),
    Default = STL(Volume)
  ) %>% 
  components() %>% 
  autoplot()

How does the seasonal shape change over time? [Hint: Try plotting the seasonal component using gg_seas).]

The volume of gas production was flat in the 1970s, however it has changed to be much more variable within a season.

Can you produce a plausible seasonally adjusted series?

canadian_gas %>% 
  model(
    Periodic = STL(Volume ~ season(window = 'periodic')),
    Default = STL(Volume)
  ) %>% 
  components() %>% 
  ggplot() +
  geom_line(aes(Month, season_adjust)) +
  facet_grid(rows = vars(.model))

The change in variability of the season in the 1980s will makes it difficult to produce a seasonally adjusted series.

Chapter 3 - Time Series Decomposition

Greg Foletta

2020-06-29