In the video, you saw various types of data. In this exercise, you will plot additional time series data and compare them to what you saw in the video. It is useful to think about how these time series compare to the series in the video. In particular, concentrate on the type of trend, seasonality or periodicity, and homoscedasticity. Before you use a data set for the first time, you should use the help system to see the details of the data. For example, use help(AirPassengers) or ?AirPassengers to see the details of the series. The packages astsa and xts are preloaded in your R environment.
library(xts)
package 㤼㸱xts㤼㸲 was built under R version 4.0.3Loading required package: zoo
package 㤼㸱zoo㤼㸲 was built under R version 4.0.3
Attaching package: 㤼㸱zoo㤼㸲
The following objects are masked from 㤼㸱package:base㤼㸲:
as.Date, as.Date.numeric
library(astsa)
package 㤼㸱astsa㤼㸲 was built under R version 4.0.3
# View a detailed description of AirPassengers
help(AirPassengers)
# Plot AirPassengers
plot(AirPassengers)
# Plot the DJIA daily closings
plot(djia$Close)
# Plot the Southern Oscillation Index
plot(soi)
As you can see, the AirPassengers dataset contains monthly information on airline passengers from 1949 through 1960. Note that when you plot ts objects using plot(), the data will automatically be plotted over time. The AirPassengers data show a handful of important qualities, including seasonality, trend, and heteroscedasticity, which distinguish the data from standard white noise.
when a time series is trend stationary, it will have stationary behavior around a trend. A simple example is where is stationary.
A different type of model for trend is random walk, which has the form , where is white noise. It is called a random walk because at time t the process is where it was at time t−1 plus a completely random movement. For a random walk with drift, a constant is added to the model and will cause the random walk to drift in the direction (positive or negative) of the drift.
We simulated and plotted data from these models. Note the difference in the behavior of the two models.
plot(cbind(y, x))
In both cases, simple differencing can remove the trend and coerce the data to stationarity. Differencing looks at the difference between the value of a time series at a certain point in time and its preceding value. That is, Xt−Xt−1 is computed.
To check that it works, you will difference each generated time series and plot the detrended series. If a time series is in x, then diff(x) will have the detrended series obtained by differencing the data. To plot the detrended series, simply use plot(diff(x)).
# Plot detrended y (trend stationary)
plot(diff(y))
# Plot detrended x (random walk)
plot(diff(x))
As you can see, differencing both your trend stationary data and your random walk data has the effect of removing the trends, despite the important differences between the two datasets.
As you have seen in the previous exercise, differencing is generally good for removing trend from time series data. Recall that differencing looks at the difference between the value of a time series at a certain point in time and its preceding value.
In this exercise, you will use differencing diff() to detrend and plot real time series data.
library(astsa)
# Plot globtemp and detrended globtemp
par(mfrow = c(2,1))
plot(globtemp)
plot(diff(globtemp))
# Plot cmort and detrended cmort
#weekly cardiovascular mortality in Los Angeles County
par(mfrow = c(2,1))
plot(cmort)
plot(diff(cmort))
Differencing is a great way to remove trends from your data.
Here, we will coerce nonstationary data to stationarity by calculating the return or growth rate as follows.
Often time series are generated as
meaning that the value of the time series observed at time t equals the value observed at time and a small percent change at time .
A simple deterministic example is putting money into a bank with a fixed interest . In this case, is the value of the account at time period with an initial deposit of .
Typically, is referred to as the return or growth rate of a time series, and this process is often stable.
For reasons that are outside the scope of this course, it can be shown that the growth rate pt can be approximated by
.
In R, pt is often calculated as diff(log(x)) and plotting it can be done in one line plot(diff(log(x))).
# astsa and xts are preloaded
# Plot GNP series (gnp) and its growth rate
par(mfrow = c(2,1))
plot(gnp)
plot(diff(log(gnp)))
# Plot DJIA closings (djia$Close) and its returns
par(mfrow = c(2,1))
plot(djia$Close)
plot(diff(log(djia$Close)))
Once again, by combining a few commands to manipulate your data, you can coerce otherwise nonstationary data to stationarity.
Any stationary time series can be written as a linear combination of white noise. In addition, any ARMA model has this form, so it is a good choice for modeling stationary time series.
R provides a simple function called arima.sim() to generate data from an ARMA model. For example, the syntax for generating 100 observations from an MA(1) with parameter .9 is
arima.sim(model = list(order = c(0, 0, 1), ma = .9 ), n = 100)
You can also use order = c(0, 0, 0) to generate white noise.
In this exercise, you will generate data from various ARMA models. For each command, generate 200 observations and plot the result.
# Generate and plot white noise
WN <- arima.sim(model = list(order = c(0, 0, 0)), n = 200)
plot(WN)
# Generate and plot an MA(1) with parameter .9
MA <- arima.sim(model = list(order = c(0, 0, 1), ma = .9), n = 200)
plot(MA)
# Generate and plot an AR(2) with parameters 1.5 and -.75
AR <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -.75)), n = 200)
plot(AR)
The arima.sim() command is a very useful way to quickly simulate time series data.
Recall that you use the ACF and PACF pair to help identify the orders p and q of an ARMA model. The following table is a summary of the results:
In this exercise, you will generate data from the AR(1) model,
look at the simulated data and the sample ACF and PACF pair to determine the order. Then, you will fit the model and compare the estimated parameters to the true parameters.
Throughout this course, you will be using sarima() from the astsa package to easily fit models to data.
# Generate 100 observations from the AR(1) model
x <- arima.sim(model = list(order = c(1, 0, 0), ar = .9), n = 100)
# Plot the generated data
plot(x)
# Plot the sample P/ACF pair
plot(acf2(x))
# Fit an AR(1) to the data and examine the t-table
sarima(x, p = 1, d = 0, q = 0)
initial value 0.801648
iter 2 value 0.046817
iter 3 value 0.046669
iter 4 value 0.046665
iter 5 value 0.046652
iter 6 value 0.046641
iter 7 value 0.046638
iter 8 value 0.046638
iter 9 value 0.046637
iter 10 value 0.046636
iter 11 value 0.046636
iter 12 value 0.046636
iter 12 value 0.046636
iter 12 value 0.046636
final value 0.046636
converged
initial value 0.052570
iter 2 value 0.052500
iter 3 value 0.052152
iter 4 value 0.052127
iter 5 value 0.052104
iter 6 value 0.052089
iter 7 value 0.052089
iter 8 value 0.052088
iter 9 value 0.052088
iter 10 value 0.052088
iter 10 value 0.052088
iter 10 value 0.052088
final value 0.052088
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 xmean
0.8842 -0.1129
s.e. 0.0455 0.8493
sigma^2 estimated as 1.093: log likelihood = -147.1, aic = 300.21
$degrees_of_freedom
[1] 98
$ttable
Estimate SE t.value p.value
ar1 0.8842 0.0455 19.416 0.0000
xmean -0.1129 0.8493 -0.133 0.8945
$AIC
[1] 3.002053
$AICc
[1] 3.00329
$BIC
[1] 3.080208
As you can see, the sarima() command provides extensive output to understand the results of your model fit. What do you glean from this output?
For this exercise, we generated data from the AR(2) model,
,
using
x <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -.75)), n = 200)
Look at the simulated data and the sample ACF and PACF pair to determine the model order. Then fit the model and compare the estimated parameters to the true parameters.
# astsa is preloaded
# Plot x
plot(x)
# Plot the sample P/ACF of x
plot(acf2(x))
# Fit an AR(2) to the data and examine the t-table
sarima(x, p = 2, d = 0, q = 0)
initial value 1.057297
iter 2 value 0.928302
iter 3 value 0.510843
iter 4 value 0.277768
iter 5 value 0.087353
iter 6 value 0.001045
iter 7 value -0.022934
iter 8 value -0.033359
iter 9 value -0.033369
iter 10 value -0.033379
iter 11 value -0.033384
iter 12 value -0.033386
iter 13 value -0.033386
iter 13 value -0.033386
iter 13 value -0.033386
final value -0.033386
converged
initial value -0.026185
iter 2 value -0.026201
iter 3 value -0.026218
iter 4 value -0.026218
iter 5 value -0.026218
iter 6 value -0.026218
iter 6 value -0.026218
iter 6 value -0.026218
final value -0.026218
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 xmean
1.5122 -0.7691 -0.2540
s.e. 0.0455 0.0456 0.2662
sigma^2 estimated as 0.9343: log likelihood = -278.54, aic = 565.09
$degrees_of_freedom
[1] 197
$ttable
Estimate SE t.value p.value
ar1 1.5122 0.0455 33.2000 0.0000
ar2 -0.7691 0.0456 -16.8523 0.0000
xmean -0.2540 0.2662 -0.9542 0.3411
$AIC
[1] 2.82544
$AICc
[1] 2.826053
$BIC
[1] 2.891407
As you can see from the t-table output, the estimates produced by the sarima() command are close to the true parameters.
In this exercise, we generated data from an MA(1) model,
$X_t=W_t−.8W_{t−1},
x <- arima.sim(model = list(order = c(0, 0, 1), ma = -.8), n = 100)
Look at the simulated data and the sample ACF and PACF to determine the order based on the table given in the first exercise. Then fit the model.
Recall that for pure MA(q) models, the theoretical ACF will cut off at lag q while the PACF will tail off.
# astsa is preloaded
# Plot x
plot(x)
# Plot the sample P/ACF of x
plot(acf2(x))
# Fit an MA(1) to the data and examine the t-table
sarima(x, p = 0, d = 0, q = 1)
initial value 0.194276
iter 2 value -0.032690
iter 3 value -0.070984
iter 4 value -0.072938
iter 5 value -0.078212
iter 6 value -0.078424
iter 7 value -0.078545
iter 8 value -0.078601
iter 9 value -0.078606
iter 10 value -0.078606
iter 10 value -0.078606
iter 10 value -0.078606
final value -0.078606
converged
initial value -0.077047
iter 2 value -0.077086
iter 3 value -0.077088
iter 4 value -0.077089
iter 5 value -0.077089
iter 5 value -0.077089
iter 5 value -0.077089
final value -0.077089
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 xmean
-0.7590 0.0186
s.e. 0.0543 0.0229
sigma^2 estimated as 0.8498: log likelihood = -134.19, aic = 274.37
$degrees_of_freedom
[1] 98
$ttable
Estimate SE t.value p.value
ma1 -0.7590 0.0543 -13.9736 0.0000
xmean 0.0186 0.0229 0.8125 0.4185
$AIC
[1] 2.7437
$AICc
[1] 2.744937
$BIC
[1] 2.821855
Once again, the parameter estimates produced by sarima() come quite close to the input specified when the data was created (-.8).
You are now ready to merge the AR model and the MA model into the ARMA model. We generated data from the ARMA(2,1) model,
x <- arima.sim(model = list(order = c(2, 0, 1), ar = c(1, -.9), ma = .8), n = 250)
Look at the simulated data and the sample ACF and PACF pair to determine a possible model.
Recall that for ARMA(p,q) models, both the theoretical ACF and PACF tail off. In this case, the orders are difficult to discern from data and it may not be clear if either the sample ACF or sample PACF is cutting off or tailing off. In this case, you know the actual model orders, so fit an ARMA(2,1) to the generated data. General modeling strategies will be discussed further in the course.
# astsa is preloaded
# Plot x
plot(x)
# Plot the sample P/ACF of x
plot(acf2(x))
# Fit an ARMA(2,1) to the data and examine the t-table
sarima(x, p = 2, d = 0, q = 1)
initial value 1.404294
iter 2 value 0.614164
iter 3 value 0.388423
iter 4 value 0.209345
iter 5 value 0.032239
iter 6 value 0.025971
iter 7 value 0.023637
iter 8 value 0.016583
iter 9 value 0.013160
iter 10 value 0.012996
iter 11 value 0.012995
iter 12 value 0.012995
iter 13 value 0.012995
iter 14 value 0.012995
iter 15 value 0.012995
iter 15 value 0.012995
iter 15 value 0.012995
final value 0.012995
converged
initial value 0.021798
iter 2 value 0.021779
iter 3 value 0.021767
iter 4 value 0.021723
iter 5 value 0.021690
iter 6 value 0.021686
iter 7 value 0.021685
iter 8 value 0.021685
iter 8 value 0.021685
iter 8 value 0.021685
final value 0.021685
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 ma1 xmean
0.9962 -0.8884 0.7773 -0.0435
s.e. 0.0290 0.0284 0.0354 0.1274
sigma^2 estimated as 1.019: log likelihood = -360.16, aic = 730.31
$degrees_of_freedom
[1] 246
$ttable
Estimate SE t.value p.value
ar1 0.9962 0.0290 34.3776 0.0000
ar2 -0.8884 0.0284 -31.3280 0.0000
ma1 0.7773 0.0354 21.9283 0.0000
xmean -0.0435 0.1274 -0.3413 0.7332
$AIC
[1] 2.921248
$AICc
[1] 2.921901
$BIC
[1] 2.991677
As you can see, the sarima() command can estimate parameter values for many different types of models, including AR, MA, and ARMA.
Look at (1) the data plots and (2) the sample ACF and PACF of the logged and differenced varve series:
plot(varve)
dl_varve <- diff(log(varve))
plot(dl_varve)
A varve is an annual layer of sediment or sedimentary rock. The data contains sedimentary deposits from one location in Massachusetts for 634 years, beginning nearly 12,000 years ago.
plot(acf2(varve))
plot(acf2(dl_varve))
Remember that an MA(q) model has an ACF that cuts off at lag q and a PACF that tails off. In this case, the ACF cuts off at lag 1 and the PACF tails off, suggesting an MA(1) model.
Based on the sample P/ACF pair of the logged and differenced varve data (dl_varve), an MA(1) was indicated. The best approach to fitting ARMA is to start with a low order model, and then try to add a parameter at a time to see if the results change.
In this exercise, you will fit various models to the dl_varve data and note the AIC and BIC for each model. In the next exercise, you will use these AICs and BICs to choose a model. Remember that you want to retain the model with the smallest AIC and/or BIC value.
A note before you start:
sarima(x, p = 0, d = 0, q = 1) and sarima(x, 0, 0, 1)
are the same.
Use sarima() to fit the different models to dl_varve. Take a close look at the output of your sarima() command to see the AIC and BIC for this model.
# Fit an MA(1) to dl_varve.
sarima(dl_varve, 0, 0, 1)
initial value -0.551780
iter 2 value -0.671633
iter 3 value -0.706234
iter 4 value -0.707586
iter 5 value -0.718543
iter 6 value -0.719692
iter 7 value -0.721967
iter 8 value -0.722970
iter 9 value -0.723231
iter 10 value -0.723247
iter 11 value -0.723248
iter 12 value -0.723248
iter 12 value -0.723248
iter 12 value -0.723248
final value -0.723248
converged
initial value -0.722762
iter 2 value -0.722764
iter 3 value -0.722764
iter 4 value -0.722765
iter 4 value -0.722765
iter 4 value -0.722765
final value -0.722765
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 xmean
-0.7710 -0.0013
s.e. 0.0341 0.0044
sigma^2 estimated as 0.2353: log likelihood = -440.68, aic = 887.36
$degrees_of_freedom
[1] 631
$ttable
Estimate SE t.value p.value
ma1 -0.7710 0.0341 -22.6002 0.0000
xmean -0.0013 0.0044 -0.2818 0.7782
$AIC
[1] 1.401826
$AICc
[1] 1.401856
$BIC
[1] 1.422918
# Fit an MA(2) to dl_varve. Improvement?
sarima(dl_varve, 0, 0, 2)
initial value -0.551780
iter 2 value -0.679736
iter 3 value -0.728605
iter 4 value -0.734640
iter 5 value -0.735449
iter 6 value -0.735979
iter 7 value -0.736015
iter 8 value -0.736059
iter 9 value -0.736060
iter 10 value -0.736060
iter 11 value -0.736061
iter 12 value -0.736061
iter 12 value -0.736061
iter 12 value -0.736061
final value -0.736061
converged
initial value -0.735372
iter 2 value -0.735378
iter 3 value -0.735379
iter 4 value -0.735379
iter 4 value -0.735379
iter 4 value -0.735379
final value -0.735379
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 ma2 xmean
-0.6710 -0.1595 -0.0013
s.e. 0.0375 0.0392 0.0033
sigma^2 estimated as 0.2294: log likelihood = -432.69, aic = 873.39
$degrees_of_freedom
[1] 630
$ttable
Estimate SE t.value p.value
ma1 -0.6710 0.0375 -17.9057 0.0000
ma2 -0.1595 0.0392 -4.0667 0.0001
xmean -0.0013 0.0033 -0.4007 0.6888
$AIC
[1] 1.379757
$AICc
[1] 1.379817
$BIC
[1] 1.40788
# Fit an ARMA(1,1) to dl_varve. Improvement?
sarima(dl_varve, 1, 0, 1)
initial value -0.550994
iter 2 value -0.648962
iter 3 value -0.676965
iter 4 value -0.699167
iter 5 value -0.724554
iter 6 value -0.726719
iter 7 value -0.729066
iter 8 value -0.731976
iter 9 value -0.734235
iter 10 value -0.735969
iter 11 value -0.736410
iter 12 value -0.737045
iter 13 value -0.737600
iter 14 value -0.737641
iter 15 value -0.737643
iter 16 value -0.737643
iter 17 value -0.737643
iter 18 value -0.737643
iter 18 value -0.737643
iter 18 value -0.737643
final value -0.737643
converged
initial value -0.737522
iter 2 value -0.737527
iter 3 value -0.737528
iter 4 value -0.737529
iter 5 value -0.737530
iter 5 value -0.737530
iter 5 value -0.737530
final value -0.737530
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ma1 xmean
0.2341 -0.8871 -0.0013
s.e. 0.0518 0.0292 0.0028
sigma^2 estimated as 0.2284: log likelihood = -431.33, aic = 870.66
$degrees_of_freedom
[1] 630
$ttable
Estimate SE t.value p.value
ar1 0.2341 0.0518 4.5184 0.0000
ma1 -0.8871 0.0292 -30.4107 0.0000
xmean -0.0013 0.0028 -0.4618 0.6444
$AIC
[1] 1.375456
$AICc
[1] 1.375517
$BIC
[1] 1.403579
AIC and BIC help you find the model with the smallest error using the least number of parameters. The idea is based on the parsimony principle, which is basic to all science and tells you to choose the simplest scientific explanation that fits the evidence.
The lowest AIC value of the three models is the ARMA(1,1) model, meaning AIC prefers that model over the MA(1) model.
sarima() run includes a residual analysis graphic. Specifically, the output shows (1) the standardized residuals, (2) the sample ACF of the residuals, (3) a normal Q-Q plot, and (4) the p-values corresponding to the Box-Ljung-Pierce Q-statistic.
In each run, check the four residual plots as follows:
# Fit an MA(1) to dl_varve. Examine the residuals
sarima(dl_varve, 0, 0 ,1)
initial value -0.551780
iter 2 value -0.671633
iter 3 value -0.706234
iter 4 value -0.707586
iter 5 value -0.718543
iter 6 value -0.719692
iter 7 value -0.721967
iter 8 value -0.722970
iter 9 value -0.723231
iter 10 value -0.723247
iter 11 value -0.723248
iter 12 value -0.723248
iter 12 value -0.723248
iter 12 value -0.723248
final value -0.723248
converged
initial value -0.722762
iter 2 value -0.722764
iter 3 value -0.722764
iter 4 value -0.722765
iter 4 value -0.722765
iter 4 value -0.722765
final value -0.722765
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 xmean
-0.7710 -0.0013
s.e. 0.0341 0.0044
sigma^2 estimated as 0.2353: log likelihood = -440.68, aic = 887.36
$degrees_of_freedom
[1] 631
$ttable
Estimate SE t.value p.value
ma1 -0.7710 0.0341 -22.6002 0.0000
xmean -0.0013 0.0044 -0.2818 0.7782
$AIC
[1] 1.401826
$AICc
[1] 1.401856
$BIC
[1] 1.422918
# Fit an ARMA(1,1) to dl_varve. Examine the residuals
sarima(dl_varve, 1, 0 ,1)
initial value -0.550994
iter 2 value -0.648962
iter 3 value -0.676965
iter 4 value -0.699167
iter 5 value -0.724554
iter 6 value -0.726719
iter 7 value -0.729066
iter 8 value -0.731976
iter 9 value -0.734235
iter 10 value -0.735969
iter 11 value -0.736410
iter 12 value -0.737045
iter 13 value -0.737600
iter 14 value -0.737641
iter 15 value -0.737643
iter 16 value -0.737643
iter 17 value -0.737643
iter 18 value -0.737643
iter 18 value -0.737643
iter 18 value -0.737643
final value -0.737643
converged
initial value -0.737522
iter 2 value -0.737527
iter 3 value -0.737528
iter 4 value -0.737529
iter 5 value -0.737530
iter 5 value -0.737530
iter 5 value -0.737530
final value -0.737530
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ma1 xmean
0.2341 -0.8871 -0.0013
s.e. 0.0518 0.0292 0.0028
sigma^2 estimated as 0.2284: log likelihood = -431.33, aic = 870.66
$degrees_of_freedom
[1] 630
$ttable
Estimate SE t.value p.value
ar1 0.2341 0.0518 4.5184 0.0000
ma1 -0.8871 0.0292 -30.4107 0.0000
xmean -0.0013 0.0028 -0.4618 0.6444
$AIC
[1] 1.375456
$AICc
[1] 1.375517
$BIC
[1] 1.403579
The residuals for the MA(1) model are not white noise. The residuals for the ARMA(1, 1) model appear to be Gaussian white noise.
By now you have gained considerable experience fitting ARMA models to data, but before you start celebrating, try one more exercise (sort of) on your own.
The data in oil are crude oil, WTI spot price FOB (in dollars per barrel), weekly data from 2000 to 2008. Use your skills to fit an ARMA model to the returns. The weekly crude oil prices (oil) are plotted for you. Throughout the exercise, work with the returns, which you will calculate.
plot(oil)
# Calculate approximate oil returns
oil_returns <- diff(log(oil))
# Plot oil_returns. Notice the outliers.
plot(oil_returns)
# Plot the P/ACF pair for oil_returns
plot(acf2(oil_returns))
# Assuming both P/ACF are tailing, fit a model to oil_returns
sarima(oil_returns, 1, 0, 1)
initial value -3.057594
iter 2 value -3.061420
iter 3 value -3.067360
iter 4 value -3.067479
iter 5 value -3.071834
iter 6 value -3.074359
iter 7 value -3.074843
iter 8 value -3.076656
iter 9 value -3.080467
iter 10 value -3.081546
iter 11 value -3.081603
iter 12 value -3.081615
iter 13 value -3.081642
iter 14 value -3.081643
iter 14 value -3.081643
iter 14 value -3.081643
final value -3.081643
converged
initial value -3.082345
iter 2 value -3.082345
iter 3 value -3.082346
iter 4 value -3.082346
iter 5 value -3.082346
iter 5 value -3.082346
iter 5 value -3.082346
final value -3.082346
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ma1 xmean
-0.5264 0.7146 0.0018
s.e. 0.0871 0.0683 0.0022
sigma^2 estimated as 0.002102: log likelihood = 904.89, aic = -1801.79
$degrees_of_freedom
[1] 541
$ttable
Estimate SE t.value p.value
ar1 -0.5264 0.0871 -6.0422 0.0000
ma1 0.7146 0.0683 10.4699 0.0000
xmean 0.0018 0.0022 0.7981 0.4252
$AIC
[1] -3.312109
$AICc
[1] -3.312027
$BIC
[1] -3.280499
A time series is called ARIMA(p,d,q) if the differenced series (of order d) is ARMA(p,q).
To get a sense of how the model works, you will analyze simulated data from the integrated model
where . In this case, the model is an ARIMA(1,1,0) because the differenced data are an autoregression of order one.
The simulated time series is in x and it was generated in R as
x <- arima.sim(model = list(order = c(1, 1, 0), ar = .9), n = 200)
You will plot the generated data and the sample ACF and PACF of the generated data to see how integrated data behave. Then, you will difference the data to make it stationary. You will plot the differenced data and the corresponding sample ACF and PACF to see how differencing makes a difference.
# Plot x
plot(x)
# Plot the P/ACF pair of x
acf2(x)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.98 0.96 0.94 0.92 0.90 0.87 0.85 0.82 0.79 0.76 0.74
PACF 0.98 -0.07 -0.06 -0.06 -0.05 -0.04 -0.04 -0.04 -0.02 -0.02 0.01
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
ACF 0.71 0.68 0.66 0.63 0.60 0.58 0.56 0.53 0.51 0.48 0.46
PACF 0.00 0.00 0.00 0.00 -0.01 0.00 0.00 -0.01 -0.01 -0.02 -0.02
[,23] [,24] [,25]
ACF 0.44 0.42 0.39
PACF -0.02 -0.02 -0.01
# Plot the differenced data
plot(diff(x))
# Plot the P/ACF pair of the differenced data
acf2(diff(x))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
ACF 0.89 0.81 0.70 0.63 0.55 0.51 0.46 0.42 0.36 0.33 0.31 0.31
PACF 0.89 0.03 -0.11 0.09 -0.05 0.11 -0.03 -0.02 -0.07 0.08 0.10 0.05
[,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
ACF 0.28 0.27 0.22 0.20 0.15 0.11 0.08 0.07 0.04 0.03 0.04
PACF -0.13 0.03 -0.10 0.06 -0.10 -0.10 0.08 0.05 -0.06 0.03 0.11
[,24] [,25]
ACF 0.07 0.09
PACF 0.10 0.01
As you can see, differencing the data in your ARIMA(1,1,0) model makes it stationary and allows for further analysis.
Before analyzing actual time series data, you should try working with a slightly more complicated model.
Here, we generated 250 observations from the ARIMA(2,1,0) model with drift given by
where .
x <- arima.sim(model = list(order = c(2, 1, 0), ar = c(1.5, -.75)), n = 250)
You will use the established techniques to fit a model to the data.
# Plot sample P/ACF of differenced data and determine model
acf2(diff(x))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
ACF 0.86 0.56 0.24 -0.04 -0.21 -0.27 -0.25 -0.20 -0.13 -0.07 -0.02 -0.01
PACF 0.86 -0.70 0.09 -0.06 0.03 0.02 -0.06 -0.05 0.02 0.00 -0.02 -0.12
[,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
ACF -0.03 -0.05 -0.06 -0.04 0.00 0.06 0.11 0.14 0.15 0.15 0.13
PACF 0.01 0.02 0.04 0.01 0.03 0.05 -0.01 0.00 0.09 0.03 -0.06
[,24] [,25] [,26]
ACF 0.09 0.04 -0.02
PACF 0.05 -0.08 -0.01
# Estimate parameters and examine output
sarima(x, 2, 1, 0)
initial value 1.062512
iter 2 value 0.934631
iter 3 value 0.501233
iter 4 value 0.267352
iter 5 value 0.117267
iter 6 value 0.024744
iter 7 value 0.021312
iter 8 value 0.021118
iter 9 value 0.021111
iter 10 value 0.021097
iter 11 value 0.021095
iter 12 value 0.021092
iter 13 value 0.021092
iter 14 value 0.021092
iter 14 value 0.021092
iter 14 value 0.021092
final value 0.021092
converged
initial value 0.025828
iter 2 value 0.025819
iter 3 value 0.025803
iter 4 value 0.025802
iter 5 value 0.025801
iter 6 value 0.025801
iter 6 value 0.025801
iter 6 value 0.025801
final value 0.025801
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 constant
1.4776 -0.7084 0.2600
s.e. 0.0443 0.0445 0.2794
sigma^2 estimated as 1.041: log likelihood = -361.18, aic = 730.37
$degrees_of_freedom
[1] 247
$ttable
Estimate SE t.value p.value
ar1 1.4776 0.0443 33.3799 0.000
ar2 -0.7084 0.0445 -15.9200 0.000
constant 0.2600 0.2794 0.9305 0.353
$AIC
[1] 2.921479
$AICc
[1] 2.921869
$BIC
[1] 2.977822
As you can see from your t-table, the estimated parameters are very close to 1.5 and -0.75.
Now that you have some experience fitting an ARIMA model to simulated data, your next task is to apply your skills to some real world data.
The data in globtemp (from astsa) are the annual global temperature deviations to 2015. In this exercise, you will use established techniques to fit an ARIMA model to the data. A plot of the data shows random walk behavior, which suggests you should work with the differenced data. The differenced data diff(globtemp) are also plotted.
plot(globtemp)
plot(diff(globtemp))
After plotting the sample ACF and PACF of the differenced data diff(globtemp), you can say that either
The ACF and the PACF are both tailing off, implying an ARIMA(1,1,1) model. The ACF cuts off at lag 2, and the PACF is tailing off, implying an ARIMA(0,1,2) model. The ACF is tailing off and the PACF cuts off at lag 3, implying an ARIMA(3,1,0) model. Although this model fits reasonably well, it is the worst of the three (you can check it) because it uses too many parameters for such small autocorrelations. After fitting the first two models, check the AIC and BIC to choose the preferred model.
# Plot the sample P/ACF pair of the differenced data
acf2(diff(globtemp))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
ACF -0.24 -0.19 -0.08 0.20 -0.15 -0.03 0.03 0.14 -0.16 0.11 -0.05 0.00
PACF -0.24 -0.26 -0.23 0.06 -0.16 -0.09 -0.05 0.07 -0.09 0.11 -0.03 -0.02
[,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
ACF -0.13 0.14 -0.01 -0.08 0 0.19 -0.07 0.02 -0.02 0.08
PACF -0.10 0.02 0.00 -0.09 0 0.11 0.04 0.13 0.09 0.08
# Fit an ARIMA(1,1,1) model to globtemp
sarima(globtemp, 1, 1, 1)
initial value -2.218917
iter 2 value -2.253118
iter 3 value -2.263750
iter 4 value -2.272144
iter 5 value -2.282786
iter 6 value -2.296777
iter 7 value -2.297062
iter 8 value -2.297253
iter 9 value -2.297389
iter 10 value -2.297405
iter 11 value -2.297413
iter 12 value -2.297413
iter 13 value -2.297414
iter 13 value -2.297414
iter 13 value -2.297414
final value -2.297414
converged
initial value -2.305504
iter 2 value -2.305800
iter 3 value -2.305821
iter 4 value -2.306655
iter 5 value -2.306875
iter 6 value -2.306950
iter 7 value -2.306955
iter 8 value -2.306955
iter 8 value -2.306955
final value -2.306955
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ma1 constant
0.3549 -0.7663 0.0072
s.e. 0.1314 0.0874 0.0032
sigma^2 estimated as 0.009885: log likelihood = 119.88, aic = -231.76
$degrees_of_freedom
[1] 132
$ttable
Estimate SE t.value p.value
ar1 0.3549 0.1314 2.7008 0.0078
ma1 -0.7663 0.0874 -8.7701 0.0000
constant 0.0072 0.0032 2.2738 0.0246
$AIC
[1] -1.716773
$AICc
[1] -1.715416
$BIC
[1] -1.630691
# Fit an ARIMA(0,1,2) model to globtemp. Which model is better?
sarima(globtemp, 0, 1, 2)
initial value -2.220513
iter 2 value -2.294887
iter 3 value -2.307682
iter 4 value -2.309170
iter 5 value -2.310360
iter 6 value -2.311251
iter 7 value -2.311636
iter 8 value -2.311648
iter 9 value -2.311649
iter 9 value -2.311649
iter 9 value -2.311649
final value -2.311649
converged
initial value -2.310187
iter 2 value -2.310197
iter 3 value -2.310199
iter 4 value -2.310201
iter 5 value -2.310202
iter 5 value -2.310202
iter 5 value -2.310202
final value -2.310202
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 ma2 constant
-0.3984 -0.2173 0.0072
s.e. 0.0808 0.0768 0.0033
sigma^2 estimated as 0.00982: log likelihood = 120.32, aic = -232.64
$degrees_of_freedom
[1] 132
$ttable
Estimate SE t.value p.value
ma1 -0.3984 0.0808 -4.9313 0.0000
ma2 -0.2173 0.0768 -2.8303 0.0054
constant 0.0072 0.0033 2.1463 0.0337
$AIC
[1] -1.723268
$AICc
[1] -1.721911
$BIC
[1] -1.637185
Judging by the AIC and BIC, the ARIMA(0,1,2) model performs better than the ARIMA(1,1,1) model on the globtemp data.
One way to check an analysis is to overfit the model by adding an extra parameter to see if it makes a difference in the results. If adding parameters changes the results drastically, then you should rethink your model. If, however, the results do not change by much, you can be confident that your fit is correct.
We generated 250 observations from an ARIMA(0,1,1) model with MA parameter .9. First, you will fit the model to the data using established techniques.
x <- arima.sim(model = list(order = c(0, 1, 1), ma = .9), n = 250)
Then, you can check a model by overfitting (adding a parameter) to see if it makes a difference. In this case, you will add an additional MA parameter to see that it is not needed.
plot(x)
plot(diff(x))
# Plot sample P/ACF pair of the differenced data
acf2(diff(x))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF 0.47 -0.06 -0.06 -0.07 -0.07 -0.04 -0.06 -0.14 -0.10 -0.01 0.01
PACF 0.47 -0.36 0.22 -0.24 0.13 -0.12 0.00 -0.16 0.07 -0.07 0.04
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
ACF 0.00 -0.09 -0.16 -0.05 0.03 0.05 0.11 0.15 0.14 0.05 -0.04
PACF -0.07 -0.11 -0.08 0.06 -0.09 0.10 0.01 0.12 0.03 -0.04 -0.06
[,23] [,24] [,25] [,26]
ACF -0.02 0.04 -0.02 -0.12
PACF 0.09 0.00 -0.04 -0.08
# Fit the first model, compare parameters, check diagnostics
sarima(x, 0, 1, 1)
initial value 0.288688
iter 2 value 0.104633
iter 3 value 0.042956
iter 4 value 0.009584
iter 5 value 0.006315
iter 6 value 0.005790
iter 7 value 0.005509
iter 8 value 0.005493
iter 9 value 0.005487
iter 10 value 0.005483
iter 11 value 0.005481
iter 12 value 0.005481
iter 13 value 0.005481
iter 14 value 0.005481
iter 15 value 0.005481
iter 15 value 0.005481
iter 15 value 0.005481
final value 0.005481
converged
initial value 0.006738
iter 2 value 0.006593
iter 3 value 0.006574
iter 4 value 0.006562
iter 5 value 0.006556
iter 6 value 0.006555
iter 7 value 0.006555
iter 8 value 0.006554
iter 8 value 0.006554
iter 8 value 0.006554
final value 0.006554
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 constant
0.8879 0.2688
s.e. 0.0337 0.1196
sigma^2 estimated as 1.007: log likelihood = -356.37, aic = 718.75
$degrees_of_freedom
[1] 248
$ttable
Estimate SE t.value p.value
ma1 0.8879 0.0337 26.3709 0.0000
constant 0.2688 0.1196 2.2474 0.0255
$AIC
[1] 2.874986
$AICc
[1] 2.87518
$BIC
[1] 2.917244
# Fit the second model and compare fit
sarima(x, 0, 1, 2)
initial value 0.288688
iter 2 value 0.094692
iter 3 value 0.015978
iter 4 value 0.011758
iter 5 value 0.006286
iter 6 value 0.005152
iter 7 value 0.005140
iter 8 value 0.005134
iter 9 value 0.005133
iter 10 value 0.005132
iter 11 value 0.005131
iter 12 value 0.005131
iter 13 value 0.005131
iter 13 value 0.005131
iter 13 value 0.005131
final value 0.005131
converged
initial value 0.006685
iter 2 value 0.006513
iter 3 value 0.006459
iter 4 value 0.006446
iter 5 value 0.006441
iter 6 value 0.006439
iter 7 value 0.006438
iter 8 value 0.006438
iter 9 value 0.006438
iter 10 value 0.006438
iter 11 value 0.006438
iter 11 value 0.006438
iter 11 value 0.006438
final value 0.006438
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 ma2 constant
0.8743 -0.0165 0.2686
s.e. 0.0655 0.0683 0.1177
sigma^2 estimated as 1.007: log likelihood = -356.34, aic = 720.69
$degrees_of_freedom
[1] 247
$ttable
Estimate SE t.value p.value
ma1 0.8743 0.0655 13.3560 0.0000
ma2 -0.0165 0.0683 -0.2413 0.8095
constant 0.2686 0.1177 2.2823 0.0233
$AIC
[1] 2.882753
$AICc
[1] 2.883143
$BIC
[1] 2.939097
As you can see from the t-table, the second MA parameter is not significantly different from zero and the first MA parameter is approximately the same in each run. Also, the AIC and BIC both increase when the parameter is added. In addition, the residual analysis of your ARIMA(0,1,1) model is fine. All of these facts together indicate that you have a successful model fit.
You can now finish your analysis of global temperatures. Recall that you previously fit two models to the data in globtemp, an ARIMA(1,1,1) and an ARIMA(0,1,2). In the final analysis, check the residual diagnostics and use AIC and BIC for model choice.
# Fit ARIMA(0,1,2) to globtemp and check diagnostics
sarima(globtemp, 0, 1, 2)
initial value -2.220513
iter 2 value -2.294887
iter 3 value -2.307682
iter 4 value -2.309170
iter 5 value -2.310360
iter 6 value -2.311251
iter 7 value -2.311636
iter 8 value -2.311648
iter 9 value -2.311649
iter 9 value -2.311649
iter 9 value -2.311649
final value -2.311649
converged
initial value -2.310187
iter 2 value -2.310197
iter 3 value -2.310199
iter 4 value -2.310201
iter 5 value -2.310202
iter 5 value -2.310202
iter 5 value -2.310202
final value -2.310202
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 ma2 constant
-0.3984 -0.2173 0.0072
s.e. 0.0808 0.0768 0.0033
sigma^2 estimated as 0.00982: log likelihood = 120.32, aic = -232.64
$degrees_of_freedom
[1] 132
$ttable
Estimate SE t.value p.value
ma1 -0.3984 0.0808 -4.9313 0.0000
ma2 -0.2173 0.0768 -2.8303 0.0054
constant 0.0072 0.0033 2.1463 0.0337
$AIC
[1] -1.723268
$AICc
[1] -1.721911
$BIC
[1] -1.637185
# Fit ARIMA(1,1,1) to globtemp and check diagnostics
sarima(globtemp, 1, 1, 1)
initial value -2.218917
iter 2 value -2.253118
iter 3 value -2.263750
iter 4 value -2.272144
iter 5 value -2.282786
iter 6 value -2.296777
iter 7 value -2.297062
iter 8 value -2.297253
iter 9 value -2.297389
iter 10 value -2.297405
iter 11 value -2.297413
iter 12 value -2.297413
iter 13 value -2.297414
iter 13 value -2.297414
iter 13 value -2.297414
final value -2.297414
converged
initial value -2.305504
iter 2 value -2.305800
iter 3 value -2.305821
iter 4 value -2.306655
iter 5 value -2.306875
iter 6 value -2.306950
iter 7 value -2.306955
iter 8 value -2.306955
iter 8 value -2.306955
final value -2.306955
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ma1 constant
0.3549 -0.7663 0.0072
s.e. 0.1314 0.0874 0.0032
sigma^2 estimated as 0.009885: log likelihood = 119.88, aic = -231.76
$degrees_of_freedom
[1] 132
$ttable
Estimate SE t.value p.value
ar1 0.3549 0.1314 2.7008 0.0078
ma1 -0.7663 0.0874 -8.7701 0.0000
constant 0.0072 0.0032 2.2738 0.0246
$AIC
[1] -1.716773
$AICc
[1] -1.715416
$BIC
[1] -1.630691
Your model diagnostics suggest that both the ARIMA(0,1,2) and the ARIMA(1,1,1) are reasonable models. However, the AIC and BIC suggest that the ARIMA(0,1,2) performs slightly better, so this should be your preferred model. Although you were not asked to do so, you can use overfitting to assess the final model.
sarima(globtemp, 0, 1, 3)
initial value -2.220513
iter 2 value -2.301394
iter 3 value -2.309608
iter 4 value -2.309941
iter 5 value -2.311492
iter 6 value -2.312474
iter 7 value -2.312669
iter 8 value -2.312672
iter 9 value -2.312672
iter 9 value -2.312672
iter 9 value -2.312672
final value -2.312672
converged
initial value -2.311272
iter 2 value -2.311279
iter 3 value -2.311281
iter 4 value -2.311283
iter 5 value -2.311283
iter 5 value -2.311283
iter 5 value -2.311283
final value -2.311283
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 ma2 ma3 constant
-0.3760 -0.2115 -0.0464 0.0072
s.e. 0.0929 0.0779 0.0865 0.0032
sigma^2 estimated as 0.009798: log likelihood = 120.47, aic = -230.93
$degrees_of_freedom
[1] 131
$ttable
Estimate SE t.value p.value
ma1 -0.3760 0.0929 -4.0458 0.0001
ma2 -0.2115 0.0779 -2.7157 0.0075
ma3 -0.0464 0.0865 -0.5370 0.5922
constant 0.0072 0.0032 2.2574 0.0256
$AIC
[1] -1.710615
$AICc
[1] -1.708336
$BIC
[1] -1.603012
sarima(globtemp, 1, 1, 2)
initial value -2.218917
iter 2 value -2.294581
iter 3 value -2.300158
iter 4 value -2.301669
iter 5 value -2.302725
iter 6 value -2.303670
iter 7 value -2.303969
iter 8 value -2.304326
iter 9 value -2.305246
iter 10 value -2.305334
iter 11 value -2.305342
iter 12 value -2.305345
iter 13 value -2.305346
iter 14 value -2.305347
iter 15 value -2.305347
iter 15 value -2.305347
iter 15 value -2.305347
final value -2.305347
converged
initial value -2.310020
iter 2 value -2.310150
iter 3 value -2.310212
iter 4 value -2.310301
iter 5 value -2.310376
iter 6 value -2.310605
iter 7 value -2.310722
iter 8 value -2.310730
iter 8 value -2.310730
iter 8 value -2.310730
final value -2.310730
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ma1 ma2 constant
0.1035 -0.4911 -0.1741 0.0072
s.e. 0.2732 0.2617 0.1453 0.0032
sigma^2 estimated as 0.009809: log likelihood = 120.39, aic = -230.78
$degrees_of_freedom
[1] 131
$ttable
Estimate SE t.value p.value
ar1 0.1035 0.2732 0.3787 0.7055
ma1 -0.4911 0.2617 -1.8762 0.0629
ma2 -0.1741 0.1453 -1.1979 0.2331
constant 0.0072 0.0032 2.2108 0.0288
$AIC
[1] -1.70951
$AICc
[1] -1.70723
$BIC
[1] -1.601907
Now that you are an expert at fitting ARIMA models, you can use your skills for forecasting. First, you will work with simulated data.
We generated 120 observations from an ARIMA(1,1,0) model with AR parameter .9.
y <- arima.sim(model = list(order = c(1, 1,0), ar = .9), n = 120)
x <- y[0:100]
plot(y)
plot(diff(y))
The data are in y and the first 100 observations are in x. These observations are plotted for you. You will fit an ARIMA(1,1,0) model to the data in x and verify that the model fits well. Then use sarima.for() from astsa to forecast the data 20 time periods ahead. You will then compare the forecasts to the actual data in y.
The basic syntax for forecasting is sarima.for(data, n.ahead, p, d, q) where n.ahead is a positive integer that specifies the forecast horizon. The predicted values and their standard errors are printed, the data are plotted in black, and the forecasts are in red along with 2 mean square prediction error bounds as blue dashed lines.
# Plot P/ACF pair of differenced data
acf2(diff(y))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
ACF 0.74 0.54 0.44 0.37 0.34 0.37 0.31 0.24 0.20 0.17 0.10 0.04
PACF 0.74 -0.02 0.11 0.04 0.08 0.16 -0.12 -0.01 0.01 0.00 -0.11 -0.07
[,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
ACF 0.01 -0.04 -0.02 0.01 -0.04 -0.06 -0.08 -0.14 -0.18
PACF 0.00 -0.07 0.09 0.02 -0.09 0.05 -0.06 -0.09 -0.07
# Fit model - check t-table and diagnostics
sarima(x, 1, 1, 0)
initial value 0.345469
iter 2 value -0.110844
iter 3 value -0.111117
iter 4 value -0.111230
iter 5 value -0.111238
iter 6 value -0.111259
iter 7 value -0.111259
iter 8 value -0.111259
iter 9 value -0.111259
iter 10 value -0.111259
iter 10 value -0.111259
iter 10 value -0.111259
final value -0.111259
converged
initial value -0.095017
iter 2 value -0.095575
iter 3 value -0.096272
iter 4 value -0.096407
iter 5 value -0.096509
iter 6 value -0.096535
iter 7 value -0.096538
iter 8 value -0.096541
iter 9 value -0.096541
iter 10 value -0.096541
iter 10 value -0.096541
iter 10 value -0.096541
final value -0.096541
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 constant
0.7818 -1.1755
s.e. 0.0635 0.4045
sigma^2 estimated as 0.8166: log likelihood = -130.92, aic = 267.83
$degrees_of_freedom
[1] 97
$ttable
Estimate SE t.value p.value
ar1 0.7818 0.0635 12.3192 0.0000
constant -1.1755 0.4045 -2.9061 0.0045
$AIC
[1] 2.705401
$AICc
[1] 2.706664
$BIC
[1] 2.784041
# Forecast the data 20 time periods ahead
sarima.for(x, n.ahead = 20, p = 1, d = 1, q = 0)
$pred
Time Series:
Start = 101
End = 120
Frequency = 1
[1] -129.4489 -129.8696 -130.4549 -131.1690 -131.9838 -132.8773 -133.8322
[8] -134.8353 -135.8761 -136.9462 -138.0393 -139.1503 -140.2754 -141.4116
[15] -142.5562 -143.7077 -144.8643 -146.0251 -147.1890 -148.3555
$se
Time Series:
Start = 101
End = 120
Frequency = 1
[1] 0.9036506 1.8463809 2.8434876 3.8491324 4.8385850 5.7987399
[7] 6.7231252 7.6091547 8.4565309 9.2662830 10.0401767 10.7803475
[13] 11.4890745 12.1686424 12.8212593 13.4490114 14.0538402 14.6375341
[19] 15.2017293 15.7479158
lines(y)
As you can see, the sarima.for() command provides a simple method for forecasting. Although the blue error bands are relatively wide, the prediction remains quite valuable.
Now you can try forecasting real data.
Here, you will forecast the annual global temperature deviations globtemp to 2050. Recall that in previous exercises, you fit an ARIMA(0,1,2) model to the data. You will refit the model to confirm it, and then forecast the series 35 years into the future.
# Fit an ARIMA(0,1,2) to globtemp and check the fit
sarima(globtemp, 0, 1, 2)
initial value -2.220513
iter 2 value -2.294887
iter 3 value -2.307682
iter 4 value -2.309170
iter 5 value -2.310360
iter 6 value -2.311251
iter 7 value -2.311636
iter 8 value -2.311648
iter 9 value -2.311649
iter 9 value -2.311649
iter 9 value -2.311649
final value -2.311649
converged
initial value -2.310187
iter 2 value -2.310197
iter 3 value -2.310199
iter 4 value -2.310201
iter 5 value -2.310202
iter 5 value -2.310202
iter 5 value -2.310202
final value -2.310202
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 ma2 constant
-0.3984 -0.2173 0.0072
s.e. 0.0808 0.0768 0.0033
sigma^2 estimated as 0.00982: log likelihood = 120.32, aic = -232.64
$degrees_of_freedom
[1] 132
$ttable
Estimate SE t.value p.value
ma1 -0.3984 0.0808 -4.9313 0.0000
ma2 -0.2173 0.0768 -2.8303 0.0054
constant 0.0072 0.0033 2.1463 0.0337
$AIC
[1] -1.723268
$AICc
[1] -1.721911
$BIC
[1] -1.637185
# Forecast data 35 years into the future
sarima.for(globtemp, n.ahead = 35, p = 0, d= 1, q = 2)
$pred
Time Series:
Start = 2016
End = 2050
Frequency = 1
[1] 0.7995567 0.7745381 0.7816919 0.7888457 0.7959996 0.8031534 0.8103072
[8] 0.8174611 0.8246149 0.8317688 0.8389226 0.8460764 0.8532303 0.8603841
[15] 0.8675379 0.8746918 0.8818456 0.8889995 0.8961533 0.9033071 0.9104610
[22] 0.9176148 0.9247687 0.9319225 0.9390763 0.9462302 0.9533840 0.9605378
[29] 0.9676917 0.9748455 0.9819994 0.9891532 0.9963070 1.0034609 1.0106147
$se
Time Series:
Start = 2016
End = 2050
Frequency = 1
[1] 0.09909556 0.11564576 0.12175580 0.12757353 0.13313729 0.13847769
[7] 0.14361964 0.14858376 0.15338730 0.15804492 0.16256915 0.16697084
[13] 0.17125943 0.17544322 0.17952954 0.18352490 0.18743511 0.19126540
[19] 0.19502047 0.19870459 0.20232164 0.20587515 0.20936836 0.21280424
[25] 0.21618551 0.21951471 0.22279416 0.22602604 0.22921235 0.23235497
[31] 0.23545565 0.23851603 0.24153763 0.24452190 0.24747019
Pure seasonal ARMA time series is correlated at the seasonal lags only. Consequently, the ACF and PACF behave as the nonseasonal counterparts, but at the seasonal lags, 1S, 2S, …, where S is the seasonal period (S = 12 for monthly data). As in the nonseasonal case, you have the pure seasonal table:
*The values at nonseasonal lags are zero.
As with other models, you can fit seasonal models in R using the sarima() command in the astsa package.
To get a feeling of how pure seasonal models work, it is best to consider simulated data. We generated 250 observations from a pure seasonal model given by
which we would denote as a SARMA(P = 1, Q = 1)S = 12.
library(simts)
package 㤼㸱simts㤼㸲 was built under R version 4.0.3
Attaching package: 㤼㸱simts㤼㸲
The following object is masked from 㤼㸱package:astsa㤼㸲:
sales
# Generate a SARMA() + WN()
sarma_wn_model <- SARMA(ar = 0, ma = 0, sar = 0.9, sma = 0.5, s = 12, sigma2 = 1.5) + WN(sigma2 = 1)
sarma_wn_sim <- gen_lts(n = 250, model = sarma_wn_model)
x <- ts(sarma_wn_sim[,3])
plot(x)
You will compare the sample ACF and PACF values from the generated data to the true values displayed to the right.
library(astsa)
# Plot sample P/ACF to lag 60 and compare to the true values
acf2(x, max.lag = 60)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
ACF -0.42 -0.02 0.26 -0.28 0.12 -0.01 0.11 -0.25 0.20 -0.01 -0.41 0.85
PACF -0.42 -0.24 0.20 -0.10 -0.01 -0.06 0.23 -0.24 0.08 -0.01 -0.35 0.74
[,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
ACF -0.41 -0.03 0.28 -0.32 0.12 -0.01 0.09 -0.22 0.15 0.03 -0.37
PACF -0.01 -0.01 -0.06 -0.09 -0.06 -0.02 -0.02 0.01 -0.06 0.07 0.10
[,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
ACF 0.74 -0.39 -0.02 0.28 -0.33 0.13 -0.01 0.08 -0.20 0.10 0.06
PACF 0.08 -0.03 0.00 -0.09 -0.01 -0.01 0.00 0.01 -0.05 -0.02 0.03
[,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45]
ACF -0.35 0.63 -0.38 -0.03 0.27 -0.36 0.12 -0.03 0.06 -0.18 0.07
PACF -0.03 -0.04 -0.12 -0.07 -0.02 -0.07 -0.06 -0.06 0.02 -0.01 0.03
[,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56]
ACF 0.07 -0.31 0.56 -0.34 -0.04 0.25 -0.36 0.14 -0.01 0.05 -0.15
PACF -0.04 0.08 -0.01 0.04 -0.05 -0.07 -0.02 0.03 0.06 0.03 0.00
[,57] [,58] [,59] [,60]
ACF 0.04 0.1 -0.27 0.50
PACF -0.07 0.0 0.03 0.09
# Fit the seasonal model to x
sarima(x, p = 0, d = 0, q = 0, P = 1, D = 0, Q = 1, S = 12)
initial value 1.408227
iter 2 value 1.343099
iter 3 value 0.721833
iter 4 value 0.716709
iter 5 value 0.706489
iter 6 value 0.706419
iter 7 value 0.706418
iter 8 value 0.706418
iter 9 value 0.706418
iter 10 value 0.706418
iter 11 value 0.706416
iter 11 value 0.706416
iter 11 value 0.706416
final value 0.706416
converged
initial value 0.725710
iter 2 value 0.724925
iter 3 value 0.724755
iter 4 value 0.724468
iter 5 value 0.724464
iter 6 value 0.724461
iter 7 value 0.724451
iter 8 value 0.724446
iter 9 value 0.724445
iter 10 value 0.724445
iter 11 value 0.724445
iter 12 value 0.724445
iter 13 value 0.724445
iter 14 value 0.724445
iter 15 value 0.724445
iter 16 value 0.724445
iter 16 value 0.724444
iter 16 value 0.724444
final value 0.724444
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
sar1 sma1 xmean
0.8753 -0.0984 -0.9037
s.e. 0.0310 0.0749 0.7096
sigma^2 estimated as 4.004: log likelihood = -535.85, aic = 1079.69
$degrees_of_freedom
[1] 247
$ttable
Estimate SE t.value p.value
sar1 0.8753 0.0310 28.2548 0.0000
sma1 -0.0984 0.0749 -1.3147 0.1898
xmean -0.9037 0.7096 -1.2735 0.2040
$AIC
[1] 4.318766
$AICc
[1] 4.319156
$BIC
[1] 4.375109
Fitting a seasonal model using sarima() requires a few additional arguments, but follows the same basic process as nonseasonal models.
Pure seasonal dependence such as that explored earlier in this chapter is relatively rare. Most seasonal time series have mixed dependence, meaning only some of the variation is explained by seasonal trends.
Recall that the full seasonal model is denoted by SARIMA(p,d,q)x(P,D,Q)S where capital letters denote the seasonal orders.
As before, this exercise asks you to compare the sample P/ACF pair to the true values for some simulated seasonal data and fit a model to the data using sarima(). This time, the simulated data come from a mixed seasonal model, SARIMA(0,0,1)x(0,0,1)12.
# Generate a SARMA() + WN()
sarma_wn_model <- SARMA(ar = 0, ma = 1, sar = 0, sma = 1, s = 12, sigma2 = 1.5) + WN(sigma2 = 1)
sarma_wn_sim <- gen_lts(n = 210, model = sarma_wn_model, unit_ts = "month", unit_time = "year")
x <- ts(sarma_wn_sim[,3])
plot(x)
# Plot sample P/ACF pair to lag 60 and compare to actual
acf2(x, max.lag = 60)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF -0.43 0.00 0.06 -0.14 0.12 -0.06 -0.03 0.07 -0.01 -0.03 0.17
PACF -0.43 -0.23 -0.05 -0.17 -0.02 -0.04 -0.07 0.00 0.05 -0.02 0.20
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
ACF -0.30 0.07 0.11 -0.11 0.05 -0.04 0.02 0.01 -0.05 0.04 0
PACF -0.17 -0.16 0.01 -0.02 -0.08 -0.05 -0.02 -0.05 -0.06 0.02 0
[,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33]
ACF 0.04 -0.15 0.13 -0.08 0.1 -0.03 -0.02 0.02 -0.04 0.12 -0.13
PACF 0.14 -0.20 -0.06 -0.03 0.1 -0.02 0.00 -0.02 -0.03 0.09 -0.03
[,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44]
ACF 0.04 -0.01 0.01 0.00 0.01 -0.01 0.07 -0.06 0.02 -0.01 0.0
PACF -0.02 0.03 -0.12 -0.08 0.02 0.04 0.09 -0.02 0.01 -0.04 0.1
[,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55]
ACF -0.02 0.04 -0.04 0.02 -0.03 0.02 -0.07 0.09 0.03 -0.05 0.09
PACF -0.05 0.02 0.01 -0.09 -0.10 -0.01 -0.10 0.08 0.07 0.00 0.09
[,56] [,57] [,58] [,59] [,60]
ACF -0.16 0.12 -0.08 0.02 0.06
PACF 0.01 -0.02 -0.05 0.00 0.00
# Fit the seasonal model to x
sarima(x, p = 0, d = 0, q = 1, P = 0, D = 0, Q = 1, S = 12)
initial value 1.888600
iter 2 value 1.681802
iter 3 value 1.661951
iter 4 value 1.652715
iter 5 value 1.647347
iter 6 value 1.645649
iter 7 value 1.644765
iter 8 value 1.644683
iter 9 value 1.644682
iter 10 value 1.644681
iter 11 value 1.644679
iter 11 value 1.644679
iter 11 value 1.644679
final value 1.644679
converged
initial value 1.641996
iter 2 value 1.641986
iter 3 value 1.641956
iter 4 value 1.641939
iter 4 value 1.641939
iter 4 value 1.641939
final value 1.641939
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = xmean, include.mean = FALSE, transform.pars = trans,
fixed = fixed, optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 sma1 xmean
-0.6183 -0.5204 -0.1482
s.e. 0.0611 0.0669 0.0697
sigma^2 estimated as 26.14: log likelihood = -642.78, aic = 1293.57
$degrees_of_freedom
[1] 207
$ttable
Estimate SE t.value p.value
ma1 -0.6183 0.0611 -10.1179 0.0000
sma1 -0.5204 0.0669 -7.7791 0.0000
xmean -0.1482 0.0697 -2.1262 0.0347
$AIC
[1] 6.15985
$AICc
[1] 6.160404
$BIC
[1] 6.223604
Time series data collected on a seasonal basis typically have mixed dependence. For example, what happens in June is often related to what happened in May as well as what happened in June of last year.
We fit a seasonal ARIMA model to the log of the monthly AirPassengers data set. You will now start to fit a seasonal ARIMA model to the monthly US unemployment data, unemp, from the astsa package.
The first thing to do is to plot the data, notice the trend and the seasonal persistence. Then look at the detrended data and remove the seasonal persistence. After that, the fully differenced data should look stationary.
# Plot unemp
plot(unemp)
# Difference your data and plot it
d_unemp <- diff(unemp)
plot(d_unemp)
# Seasonally difference d_unemp and plot it
dd_unemp <- diff(d_unemp, lag = 12)
plot(dd_unemp)
Now that you have removed the trend and seasonal variation in unemployment, the data appear to be stationary.
Now, you will continue fitting an SARIMA model to the monthly US unemployment unemp time series by looking at the sample ACF and PACF of the fully differenced series.
Note that the lag axis in the sample P/ACF plot is in terms of years. Thus, lags 1, 2, 3, … represent 1 year (12 months), 2 years (24 months), 3 years (36 months), …
Difference the data fully (as in the previous exercise) and plot the sample ACF and PACF of the transformed data to lag 60 months (5 years). Consider that, for:
# Plot P/ACF pair of fully differenced data to lag 60
dd_unemp <- diff(diff(unemp), lag = 12)
acf2(dd_unemp, max.lag = 60)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
ACF 0.21 0.33 0.15 0.17 0.10 0.06 -0.06 -0.02 -0.09 -0.17 -0.08 -0.48
PACF 0.21 0.29 0.05 0.05 0.01 -0.02 -0.12 -0.03 -0.05 -0.15 0.02 -0.43
[,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
ACF -0.18 -0.16 -0.11 -0.15 -0.09 -0.09 0.03 -0.01 0.02 -0.02 0.01
PACF -0.02 0.15 0.03 -0.04 -0.01 0.00 0.01 0.01 -0.01 -0.16 0.01
[,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
ACF -0.02 0.09 -0.05 -0.01 0.03 0.08 0.01 0.03 -0.05 0.01 0.02
PACF -0.27 0.05 -0.01 -0.05 0.05 0.09 -0.04 0.02 -0.07 -0.01 -0.08
[,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45]
ACF -0.06 -0.02 -0.12 0.01 -0.03 -0.03 -0.10 -0.02 -0.13 0.00 -0.06
PACF -0.08 -0.23 -0.08 0.06 -0.07 -0.01 0.03 -0.03 -0.11 -0.04 0.01
[,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56]
ACF 0.01 0.02 0.11 0.13 0.10 0.07 0.10 0.12 0.06 0.14 0.05
PACF 0.00 -0.03 -0.04 0.02 0.03 -0.05 0.02 0.02 -0.08 0.00 -0.03
[,57] [,58] [,59] [,60]
ACF 0.04 0.04 0.07 -0.03
PACF -0.07 0.05 0.04 -0.04
Suggest and fit a model using sarima(). Check the residuals to ensure appropriate model fit.
# Fit an appropriate model
sarima(unemp, p = 2, d = 1, q = 0, P = 0, D = 1, Q = 1, S = 12)
initial value 3.340809
iter 2 value 3.105512
iter 3 value 3.086631
iter 4 value 3.079778
iter 5 value 3.069447
iter 6 value 3.067659
iter 7 value 3.067426
iter 8 value 3.067418
iter 8 value 3.067418
final value 3.067418
converged
initial value 3.065481
iter 2 value 3.065478
iter 3 value 3.065477
iter 3 value 3.065477
iter 3 value 3.065477
final value 3.065477
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 sma1
0.1351 0.2464 -0.6953
s.e. 0.0513 0.0515 0.0381
sigma^2 estimated as 449.6: log likelihood = -1609.91, aic = 3227.81
$degrees_of_freedom
[1] 356
$ttable
Estimate SE t.value p.value
ar1 0.1351 0.0513 2.6326 0.0088
ar2 0.2464 0.0515 4.7795 0.0000
sma1 -0.6953 0.0381 -18.2362 0.0000
$AIC
[1] 8.723811
$AICc
[1] 8.723988
$BIC
[1] 8.765793
As always, keep a close eye on the output from your sarima() command to get a feel for the fit of your model.
Making money in commodities is not easy. Most commodities traders lose money rather than make it. The package astsa includes the data set chicken, which is the monthly whole bird spot price, Georgia docks, US cents per pound, from August, 2001 to July, 2016.
plot(chicken)
First, you will use your skills to carefully fit an SARIMA model to the commodity. Later, you will use the fitted model to try and forecast the whole bird spot price.
After removing the trend, the sample ACF and PACF suggest an AR(2) model because the PACF cuts off after lag 2 and the ACF tails off. However, the ACF has a small seasonal component remaining. This can be taken care of by fitting an addition SAR(1) component.
By the way, if you are interested in analyzing other commodities from various regions, you can find many different time series at index mundi.
Plot the differenced (d = 1) data diff(chicken). Note that the trend is removed and note the seasonal behavior.
# Plot differenced chicken
plot(diff(chicken))
Plot the sample ACF and PACF of the differenced data to lag 60 (5 years). Notice that an AR(2) seems appropriate but there is a small but significant seasonal component remaining in the detrended data.
# Plot P/ACF pair of differenced data to lag 60
plot(acf2(diff(chicken), max.lag = 60))
Fit an ARIMA(2,1,0) to the chicken data to see that there is correlation remaining in the residuals.
# Fit ARIMA(2,1,0) to chicken - not so good
sarima(chicken, 2, 1, 0)
initial value 0.001863
iter 2 value -0.156034
iter 3 value -0.359181
iter 4 value -0.424164
iter 5 value -0.430212
iter 6 value -0.432744
iter 7 value -0.432747
iter 8 value -0.432749
iter 9 value -0.432749
iter 10 value -0.432751
iter 11 value -0.432752
iter 12 value -0.432752
iter 13 value -0.432752
iter 13 value -0.432752
iter 13 value -0.432752
final value -0.432752
converged
initial value -0.420883
iter 2 value -0.420934
iter 3 value -0.420936
iter 4 value -0.420937
iter 5 value -0.420937
iter 6 value -0.420937
iter 6 value -0.420937
iter 6 value -0.420937
final value -0.420937
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 constant
0.9494 -0.3069 0.2632
s.e. 0.0717 0.0718 0.1362
sigma^2 estimated as 0.4286: log likelihood = -178.64, aic = 365.28
$degrees_of_freedom
[1] 176
$ttable
Estimate SE t.value p.value
ar1 0.9494 0.0717 13.2339 0.0000
ar2 -0.3069 0.0718 -4.2723 0.0000
constant 0.2632 0.1362 1.9328 0.0549
$AIC
[1] 2.040695
$AICc
[1] 2.041461
$BIC
[1] 2.111922
Fit an SARIMA(2,1,0)x(1,0,0)12 and notice the model fits well.
# Fit SARIMA(2,1,0,1,0,0,12) to chicken - that works
sarima(chicken, 2, 1, 0, 1, 0, 0, 12)
initial value 0.015039
iter 2 value -0.226398
iter 3 value -0.412955
iter 4 value -0.460882
iter 5 value -0.470787
iter 6 value -0.471082
iter 7 value -0.471088
iter 8 value -0.471090
iter 9 value -0.471092
iter 10 value -0.471095
iter 11 value -0.471095
iter 12 value -0.471096
iter 13 value -0.471096
iter 14 value -0.471096
iter 15 value -0.471097
iter 16 value -0.471097
iter 16 value -0.471097
iter 16 value -0.471097
final value -0.471097
converged
initial value -0.473585
iter 2 value -0.473664
iter 3 value -0.473721
iter 4 value -0.473823
iter 5 value -0.473871
iter 6 value -0.473885
iter 7 value -0.473886
iter 8 value -0.473886
iter 8 value -0.473886
iter 8 value -0.473886
final value -0.473886
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 sar1 constant
0.9154 -0.2494 0.3237 0.2353
s.e. 0.0733 0.0739 0.0715 0.1973
sigma^2 estimated as 0.3828: log likelihood = -169.16, aic = 348.33
$degrees_of_freedom
[1] 175
$ttable
Estimate SE t.value p.value
ar1 0.9154 0.0733 12.4955 0.0000
ar2 -0.2494 0.0739 -3.3728 0.0009
sar1 0.3237 0.0715 4.5238 0.0000
constant 0.2353 0.1973 1.1923 0.2347
$AIC
[1] 1.945971
$AICc
[1] 1.947256
$BIC
[1] 2.035005
Now you will use your new skills to carefully fit an SARIMA model to the birth time series from astsa. The data are monthly live births (adjusted) in thousands for the United States, 1948-1979, and includes the baby boom after WWII.
plot(birth)
Note the long-term trend (random walk) and the seasonal component of the data.
Use diff() to difference the data (d_birth). Use acf2() to view the sample ACF and PACF of this data to lag 60. Notice the seasonal persistence.
# Plot P/ACF to lag 60 of differenced data
d_birth <- diff(birth)
acf2(d_birth, max.lag = 60)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
ACF -0.32 0.16 -0.08 -0.19 0.09 -0.28 0.06 -0.19 -0.05 0.17 -0.26
PACF -0.32 0.06 -0.01 -0.25 -0.03 -0.26 -0.17 -0.29 -0.35 -0.16 -0.59
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
ACF 0.82 -0.28 0.17 -0.07 -0.18 0.08 -0.28 0.07 -0.18 -0.05 0.16
PACF 0.57 0.13 0.11 0.13 0.09 0.00 0.00 0.05 0.04 -0.07 -0.10
[,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33]
ACF -0.24 0.78 -0.27 0.19 -0.08 -0.17 0.07 -0.29 0.07 -0.15 -0.04
PACF -0.20 0.19 0.01 0.05 0.07 0.07 -0.02 -0.06 -0.02 0.09 0.03
[,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44]
ACF 0.14 -0.24 0.75 -0.23 0.16 -0.08 -0.15 0.05 -0.25 0.06 -0.18
PACF -0.06 -0.16 0.03 0.08 -0.10 -0.03 0.07 -0.04 0.06 0.04 -0.07
[,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55]
ACF -0.03 0.15 -0.22 0.72 -0.24 0.16 -0.08 -0.13 0.05 -0.26 0.05
PACF -0.06 0.02 -0.04 0.10 0.01 0.00 -0.03 0.04 0.03 0.00 -0.01
[,56] [,57] [,58] [,59] [,60]
ACF -0.17 -0.02 0.15 -0.23 0.70
PACF 0.01 0.03 0.04 -0.09 0.04
Use another call to diff() to take the seasonal difference of the data. Save this to dd_birth. Use another call to acf2() to view the ACF and PACF of this data, again to lag 60. Conclude that an SARIMA(0,1,1)x(0,1,1)12 model seems reasonable.
# Plot P/ACF to lag 60 of seasonal differenced data
dd_birth <- diff(d_birth, lag = 12)
acf2(dd_birth, max.lag = 60)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
ACF -0.3 -0.09 -0.09 0.00 0.07 0.03 -0.07 -0.04 0.11 0.04 0.13 -0.43
PACF -0.3 -0.20 -0.21 -0.14 -0.03 0.02 -0.06 -0.08 0.06 0.08 0.23 -0.32
[,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
ACF 0.14 -0.01 0.03 0.01 0.02 0.00 0.03 -0.07 -0.01 0 0.06
PACF -0.06 -0.13 -0.13 -0.11 0.02 0.06 0.04 -0.10 0.02 0 0.17
[,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
ACF -0.01 -0.12 0.17 -0.04 0.03 -0.05 -0.09 -0.01 0.19 -0.03 -0.09
PACF -0.13 -0.14 0.07 -0.04 -0.02 0.02 -0.06 -0.07 0.05 0.07 -0.06
[,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45]
ACF -0.02 -0.04 0.17 -0.14 0.03 -0.05 0.03 0.10 0 -0.10 -0.03
PACF 0.05 -0.16 -0.01 -0.04 -0.01 -0.03 -0.01 0.01 0 0.03 -0.02
[,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56]
ACF 0.06 0.02 0.01 -0.01 0.06 -0.08 0.03 0.01 -0.02 -0.01 0.00
PACF -0.07 0.05 -0.11 0.05 0.06 -0.03 -0.03 0.04 0.02 -0.04 -0.01
[,57] [,58] [,59] [,60]
ACF -0.07 0.17 -0.04 -0.01
PACF -0.13 0.07 0.07 -0.05
Fit the SARIMA(0,1,1)x(0,1,1)12 model. What happens?
# Fit SARIMA(0,1,1)x(0,1,1)_12. What happens?
sarima(birth, 0, 1, 1, 0, 1, 1, 12)
initial value 2.219164
iter 2 value 2.013310
iter 3 value 1.988107
iter 4 value 1.980026
iter 5 value 1.967594
iter 6 value 1.965384
iter 7 value 1.965049
iter 8 value 1.964993
iter 9 value 1.964992
iter 9 value 1.964992
iter 9 value 1.964992
final value 1.964992
converged
initial value 1.951264
iter 2 value 1.945867
iter 3 value 1.945729
iter 4 value 1.945723
iter 5 value 1.945723
iter 5 value 1.945723
iter 5 value 1.945723
final value 1.945723
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ma1 sma1
-0.4734 -0.7861
s.e. 0.0598 0.0451
sigma^2 estimated as 47.4: log likelihood = -1211.28, aic = 2428.56
$degrees_of_freedom
[1] 358
$ttable
Estimate SE t.value p.value
ma1 -0.4734 0.0598 -7.9097 0
sma1 -0.7861 0.0451 -17.4227 0
$AIC
[1] 6.545975
$AICc
[1] 6.546062
$BIC
[1] 6.577399
Add an additional AR (nonseasonal, p = 1) parameter to account for additional correlation. Does the model fit well?
# Add AR term and conclude
sarima(birth, 1, 1, 1, 0, 1, 1, 12)
initial value 2.218186
iter 2 value 2.032584
iter 3 value 1.982464
iter 4 value 1.975643
iter 5 value 1.971721
iter 6 value 1.967284
iter 7 value 1.963840
iter 8 value 1.961106
iter 9 value 1.960849
iter 10 value 1.960692
iter 11 value 1.960683
iter 12 value 1.960675
iter 13 value 1.960672
iter 13 value 1.960672
iter 13 value 1.960672
final value 1.960672
converged
initial value 1.940459
iter 2 value 1.934425
iter 3 value 1.932752
iter 4 value 1.931750
iter 5 value 1.931074
iter 6 value 1.930882
iter 7 value 1.930860
iter 8 value 1.930859
iter 8 value 1.930859
final value 1.930859
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ma1 sma1
0.3038 -0.7006 -0.8000
s.e. 0.0865 0.0604 0.0441
sigma^2 estimated as 45.91: log likelihood = -1205.93, aic = 2419.85
$degrees_of_freedom
[1] 357
$ttable
Estimate SE t.value p.value
ar1 0.3038 0.0865 3.5104 5e-04
ma1 -0.7006 0.0604 -11.5984 0e+00
sma1 -0.8000 0.0441 -18.1302 0e+00
$AIC
[1] 6.522519
$AICc
[1] 6.522695
$BIC
[1] 6.564418
The residual analysis from the first fit indicated that the residuals were not white noise. Hence, it was necessary to include an additional nonseasonal AR term to account for the extra correlation.
Previously, you fit an SARIMA(2,1,0, 0,1,1)12 model to the monthly US unemployment time series unemp. You will now use that model to forecast the data 3 years.
# Fit your previous model to unemp and check the diagnostics
sarima(unemp, 2, 1, 0, 0, 1, 1, 12)
initial value 3.340809
iter 2 value 3.105512
iter 3 value 3.086631
iter 4 value 3.079778
iter 5 value 3.069447
iter 6 value 3.067659
iter 7 value 3.067426
iter 8 value 3.067418
iter 8 value 3.067418
final value 3.067418
converged
initial value 3.065481
iter 2 value 3.065478
iter 3 value 3.065477
iter 3 value 3.065477
iter 3 value 3.065477
final value 3.065477
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), include.mean = !no.constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 sma1
0.1351 0.2464 -0.6953
s.e. 0.0513 0.0515 0.0381
sigma^2 estimated as 449.6: log likelihood = -1609.91, aic = 3227.81
$degrees_of_freedom
[1] 356
$ttable
Estimate SE t.value p.value
ar1 0.1351 0.0513 2.6326 0.0088
ar2 0.2464 0.0515 4.7795 0.0000
sma1 -0.6953 0.0381 -18.2362 0.0000
$AIC
[1] 8.723811
$AICc
[1] 8.723988
$BIC
[1] 8.765793
# Forecast the data 3 years into the future
sarima.for(unemp, n.ahead = 36, 2, 1, 0, 0, 1, 1, 12)
$pred
Jan Feb Mar Apr May Jun Jul
1979 676.4664 685.1172 653.2388 585.6939 553.8813 664.4072 647.0657
1980 683.3045 687.7649 654.8658 586.1507 553.9285 664.1108 646.6220
1981 682.6406 687.0977 654.1968 585.4806 553.2579 663.4398 645.9508
Aug Sep Oct Nov Dec
1979 611.0828 594.6414 569.3997 587.5801 581.1833
1980 610.5345 594.0427 568.7684 586.9320 580.5249
1981 609.8632 593.3713 568.0970 586.2606 579.8535
$se
Jan Feb Mar Apr May Jun Jul
1979 21.20465 32.07710 43.70167 53.66329 62.85364 71.12881 78.73590
1980 116.99599 124.17344 131.51281 138.60466 145.49706 152.12863 158.52302
1981 194.25167 201.10648 208.17066 215.11503 221.96039 228.64285 235.16874
Aug Sep Oct Nov Dec
1979 85.75096 92.28663 98.41329 104.19488 109.67935
1980 164.68623 170.63839 176.39520 181.97333 187.38718
1981 241.53258 247.74268 253.80549 259.72970 265.52323
As you can see, the forecast is able to replicate much of the seasonal variation in the original unemployment
As previously mentioned, making money in commodities is not easy. To see a difficulty in predicting a commodity, you will forecast the price of chicken to five years in the future. When you complete your forecasts, you will note that even just a few years out, the acceptable range of prices is very large. This is because commodities are subject to many sources of variation.
Recall that you previously fit an SARIMA(2,1,0, 1,0,0)12 model to the monthly US chicken price series chicken. You will use this model to calculate your forecasts.
# Fit the chicken model again and check diagnostics
sarima(chicken, 2, 1, 0, 1, 0, 0, 12)
initial value 0.015039
iter 2 value -0.226398
iter 3 value -0.412955
iter 4 value -0.460882
iter 5 value -0.470787
iter 6 value -0.471082
iter 7 value -0.471088
iter 8 value -0.471090
iter 9 value -0.471092
iter 10 value -0.471095
iter 11 value -0.471095
iter 12 value -0.471096
iter 13 value -0.471096
iter 14 value -0.471096
iter 15 value -0.471097
iter 16 value -0.471097
iter 16 value -0.471097
iter 16 value -0.471097
final value -0.471097
converged
initial value -0.473585
iter 2 value -0.473664
iter 3 value -0.473721
iter 4 value -0.473823
iter 5 value -0.473871
iter 6 value -0.473885
iter 7 value -0.473886
iter 8 value -0.473886
iter 8 value -0.473886
iter 8 value -0.473886
final value -0.473886
converged
$fit
Call:
stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
optim.control = list(trace = trc, REPORT = 1, reltol = tol))
Coefficients:
ar1 ar2 sar1 constant
0.9154 -0.2494 0.3237 0.2353
s.e. 0.0733 0.0739 0.0715 0.1973
sigma^2 estimated as 0.3828: log likelihood = -169.16, aic = 348.33
$degrees_of_freedom
[1] 175
$ttable
Estimate SE t.value p.value
ar1 0.9154 0.0733 12.4955 0.0000
ar2 -0.2494 0.0739 -3.3728 0.0009
sar1 0.3237 0.0715 4.5238 0.0000
constant 0.2353 0.1973 1.1923 0.2347
$AIC
[1] 1.945971
$AICc
[1] 1.947256
$BIC
[1] 2.035005
# Forecast the chicken data 5 years into the future
sarima.for(chicken, n.ahead = 60, 2, 1, 0, 1, 0, 0, 12)
$pred
Jan Feb Mar Apr May Jun Jul
2016
2017 110.5358 110.5612 110.5480 110.7055 111.0047 111.1189 111.1552
2018 111.8108 111.9782 112.1330 112.3431 112.5991 112.7952 112.9661
2019 114.1331 114.3464 114.5556 114.7827 115.0247 115.2473 115.4617
2020 116.7942 117.0224 117.2492 117.4819 117.7193 117.9505 118.1790
2021 119.5651 119.7980 120.0306 120.2650 120.5010 120.7350 120.9681
Aug Sep Oct Nov Dec
2016 111.0907 110.8740 110.6853 110.5045 110.5527
2017 111.1948 111.2838 111.3819 111.4825 111.6572
2018 113.1380 113.3260 113.5168 113.7085 113.9242
2019 115.6765 115.8965 116.1174 116.3386 116.5675
2020 118.4077 118.6380 118.8686 119.0993 119.3326
2021
$se
Jan Feb Mar Apr May Jun
2016
2017 3.7414959 4.1793190 4.5747009 4.9373266 5.2742129 5.5903499
2018 8.2010253 8.5605811 8.9054714 9.2372195 9.5572539 9.8667955
2019 12.0038164 12.2921541 12.5738417 12.8492868 13.1188976 13.3830477
2020 15.1557253 15.3959082 15.6323906 15.8653300 16.0948844 16.3212022
2021 17.8397890 18.0473081 18.2524651 18.4553364 18.6559977 18.8545213
Jul Aug Sep Oct Nov Dec
2016 0.6187194 1.3368594 2.0462419 2.6867986 3.2486625
2017 5.8893133 6.2367345 6.6253573 7.0309771 7.4344077 7.8255932
2018 10.1668604 10.4736807 10.7857727 11.0980056 11.4063211 11.7085266
2019 13.6420693 13.9002670 14.1573839 14.4122197 14.6638269 14.9117124
2020 16.5444204 16.7657634 16.9852163 17.2025022 17.4174076 17.6298379
2021 19.0509752