Autoregressive distributed lag (ARDL) models are an integral part of estimating scientific processes over time. However, as we extend their usefulness by adding richness in dynamic specifications (through multiple lags of variables, either in levels or differences, or lags of the dependent variable), we really begin to challenge our ability to draw meaningful inferences from coefficients alone. Variables may appear in multiple time periods, in multiple forms, and might even be filtered through lagged values of the dependent variable. Coefficients tell us about the immediate effect of some variable but really have little to say about the longrun effect.
There is a better solution. Instead of performing intense operations to develop a closedform or algebraic solution, we can rely on the power of computing to simulate the overtime response in the dependent variable of some model, given a corresponding change in an \(x\) variable. We often call these “changes” in \(x\) variables, and the associated response in the dependent variable \(y\), a “counterfactual response” (a simulated response to a “shock” we control).
dynamac helps simulate these counterfactuals. More generally, though, it is built to make using and drawing inferences from singleequation ARDL models as easy as possible. Moreover, it helps users implement the useful cointegration test from Pearson, Shin, and Smith (2001): the ARDLbounds testing procedure.
We illustrate the usefulness of these functions through example. After a brief discussion of the ARDL model in the general sense, we estimate a collection of these models and demonstrate both the challenges of the ARDL model in the abstract and the solutions of dynamac in particular.
The ARDL model has a general form where \(y\), modeled in levels or differences, can depend on itself (in lagged levels or differences), up to \(k\) variables \(x\), either in contemporaneous (same period, or appearing at time \(t\)) levels, lagged levels, contemporaneous differences, or lagged differences. Conventionally, the number of lags of the dependent variable in levels is counted by \(p\), the number of lags of the dependent variable in differences is counted by \(m\), the number of lags of the independent variables in levels is counted by \(l\), and the number of lags of the independent variables in differences is counted by \(q\) (note: \(l\) and \(q\) can begin at 0, or contemporaneous effects).
The number of time periods of each variable can, theoretically, be quite large. Or, put differently, \(p\), \(m\), \(l\), and \(q\), especially \(l\) and \(q\), might be hard to account for. Analysts without strong theory about the period of the effects often begin with a reasonable oneperiod restriction, or an ARDL model of the nature
\[y_t = \alpha_0 + \phi_1 y_{t1} + \theta_{1,0} x_{1,t} + \theta_{1,1} x_{1,t1} + \cdots + \theta_{k,0}x_{k,t} + \theta_{k,1}x_{k,t1}+ \beta *T + \epsilon_t\] where \(\alpha_0\) is a constant and \(\beta\) is a trend term. (The exact nature of this equation depends on whether \(y\) is stationary or integrated, as well as if differences or lagged differences are entered into the model. But we will get to this below.)
It’s useful at this point to stop and think: if I have multiple lags of a variable \(x\), and they are filtered through a lagged dependent variable \(y_{t1}\), it might be difficult to get a sense of the “total” effect of \(x\) on \(y\). This becomes more and more difficult as \(x\) increases in lagged levels \(l\) and potentially also lagged differences \(q\). Our coefficient estimates, while useful as estimates, are no longer as useful in direct interpretation. Put differently, the very flexibility of the ARDL model also undoes its usefulness in interpretation! So we might seek an alternative way of interpreting these models. dynamac provides a unified way of estimating ARDL models and interpreting their effects. It also provides a way of implementing a popular test for cointegration. We uncover these methods below.
dynamac includes two datasets. We will use the Wright (2017) dataset on income inequality. We can look at the first few rows of this dataset
library(dynamac)
head(ineq)
#> year mood urate concern demcontrol incshare10 csentiment incshare01
#> 1 1966 60.793 3.8 0.465 1 0.3198 97.0 0.0837
#> 2 1967 61.332 3.8 0.475 1 0.3205 94.7 0.0843
#> 3 1968 58.163 3.6 0.490 1 0.3198 94.2 0.0835
#> 4 1969 54.380 3.5 0.507 0 0.3182 90.3 0.0802
#> 5 1970 60.555 4.9 0.519 0 0.3151 75.8 0.0780
#> 6 1971 64.077 5.9 0.531 0 0.3175 80.6 0.0779
concern
is public concern about income inequality; incshare10
is the income share of the top ten percent; urate
is the unemployment rate. Wright (2017) argues that concern about income inequality grows as inequality itself worsens and economic conditions deteriorate. A simple model, then, is that concern is a function of past values of itself, the level of income share held by the top ten percent, changing levels of income share held by the top ten percent, as well changing levels of unemployment in the short term.
\[\Delta Concern_t = \alpha_0 + \phi_1 Concern_{t1} + \theta_1 Income Top 10_{t1} + \beta_1 \Delta Income Top 10_t + \beta_2 \Delta Unemployment_t + \epsilon_t\]
where the residuals are white noise. Let’s develop the corresponding ARDL model using dynamac.
Step 1 in any time series analysis is a visual inspection of the series coupled with formal tests of stationarity: whether the series has a constant mean, variance, and covariance over time (so that it reverts back to mean level), or if the series instead violates any of these three conditions. We advocate for the urca package (Pfaff, Zivot, and Stigler 2016) which includes a variety of tools and tests. Simply plotting the series reveals the following:
library(urca)
ts.plot(ineq$concern)
ts.plot(ineq$incshare10)
ts.plot(ineq$urate)
None of the three series looks especially wellbehaved. concern
rose quickly and has been moving sluggishly since. incshare10
has only grown over time, which cannot be meanreverting (like a stationary series). urate
experiences steep peaks and valleys with significant interludes in between. All three series look integrated. But we should be more discerning by using empirical tests, also included in urca. So we can execute the Augmented DickeyFuller (ADF) test, PhillipsPerron test, DFGLS test, and KPSS test on each series. On concern
this looks like
summary(ur.df(ineq$concern, type = c("none"), lags = 1))
#>
#> ###############################################
#> # Augmented DickeyFuller Test Unit Root Test #
#> ###############################################
#>
#> Test regression none
#>
#>
#> Call:
#> lm(formula = z.diff ~ z.lag.1  1 + z.diff.lag)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> 0.029626 0.006624 0.000511 0.011123 0.033842
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>t)
#> z.lag.1 0.002474 0.003597 0.688 0.495
#> z.diff.lag 0.131523 0.148125 0.888 0.379
#>
#> Residual standard error: 0.01422 on 45 degrees of freedom
#> Multiple Rsquared: 0.0243, Adjusted Rsquared: 0.01907
#> Fstatistic: 0.5603 on 2 and 45 DF, pvalue: 0.575
#>
#>
#> Value of teststatistic is: 0.6878
#>
#> Critical values for test statistics:
#> 1pct 5pct 10pct
#> tau1 2.62 1.95 1.61
summary(ur.pp(ineq$concern, type = c("Ztau"), model = c("constant"), use.lag = 1))
#>
#> ##################################
#> # PhillipsPerron Unit Root Test #
#> ##################################
#>
#> Test regression with intercept
#>
#>
#> Call:
#> lm(formula = y ~ y.l1)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> 0.031791 0.006041 0.000738 0.006481 0.031282
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>t)
#> (Intercept) 0.09997 0.02946 3.393 0.00143 **
#> y.l1 0.83010 0.05085 16.324 < 2e16 ***
#> 
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.01274 on 46 degrees of freedom
#> Multiple Rsquared: 0.8528, Adjusted Rsquared: 0.8496
#> Fstatistic: 266.5 on 1 and 46 DF, pvalue: < 2.2e16
#>
#>
#> Value of teststatistic, type: Ztau is: 3.4373
#>
#> aux. Z statistics
#> Ztaumu 3.496
#>
#> Critical values for Z statistics:
#> 1pct 5pct 10pct
#> critical values 3.571174 2.92277 2.599003
summary(ur.ers(ineq$concern, type = c("DFGLS"), model = c("constant"), lag.max = 1))
#>
#> ###############################################
#> # Elliot, Rothenberg and Stock Unit Root Test #
#> ###############################################
#>
#> Test of type DFGLS
#> detrending of series with intercept
#>
#>
#> Call:
#> lm(formula = dfgls.form, data = data.dfgls)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> 0.027024 0.003352 0.003511 0.013156 0.036332
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>t)
#> yd.lag 0.02951 0.03305 0.893 0.377
#> yd.diff.lag1 0.10824 0.14675 0.738 0.465
#>
#> Residual standard error: 0.01417 on 45 degrees of freedom
#> Multiple Rsquared: 0.0312, Adjusted Rsquared: 0.01186
#> Fstatistic: 0.7247 on 2 and 45 DF, pvalue: 0.4901
#>
#>
#> Value of teststatistic is: 0.8928
#>
#> Critical values of DFGLS are:
#> 1pct 5pct 10pct
#> critical values 2.61 1.95 1.62
summary(ur.kpss(ineq$concern, type = c("mu"), use.lag = 1))
#>
#> #######################
#> # KPSS Unit Root Test #
#> #######################
#>
#> Test is of type: mu with 1 lags.
#>
#> Value of teststatistic is: 0.6415
#>
#> Critical value for a significance level of:
#> 10pct 5pct 2.5pct 1pct
#> critical values 0.347 0.463 0.574 0.739
The ADF test, PhillipsPerron test, and DFGLS test all have null hypotheses of a unit root, all of which we fail to reject. The KPSS test has a null hypothesis of stationarity which we do reject. We have compelling evidence, then, of integration (I[1]) in the series concern
. We can check whether differencing is enough to make the series stationary by executing the same tests
summary(ur.df(diff(ineq$concern), type = c("none"), lags = 1))
summary(ur.pp(diff(ineq$concern), type = c("Ztau"), model = c("constant"), use.lag = 1))
summary(ur.ers(diff(ineq$concern), type = c("DFGLS"), model = c("constant"), lag.max = 1))
summary(ur.kpss(diff(ineq$concern), type = c("mu"), use.lag = 1))
Each of the above tests, when run, indicated that the differenced series of concern
is stationary. Having gathered the basic information about the nature of the history of our variables, we might be itching to estimate some preliminary models. To this point, R hasn’t offered much in the way of improving beyond the basic lm()
framework for using regression to estimate time series models in a consistent syntax. We elaborate on this problem and illustrate our solution, dynardl
.
dynardl
ARDL models are flexible, but their flexibility often results in variables of different lengths due to differencing and lagging. For instance, consider our simple model from above where incshare10
appears as a first lag and a first difference, urate
appears as a first difference, there is a lagged dependent variable, and concern
is the dependent variable in differences. We can introduce a lag through the build in lshift
command in dynamac. The syntax is just lshift(variable, num.lags)
, where num.lags
is the number of periods for the variable to be lagged. We can also difference a series through dshift
. The syntax is just dshift(variable)
for a first difference. For instance,
head(ineq$incshare10)
#> [1] 0.3198 0.3205 0.3198 0.3182 0.3151 0.3175
head(lshift(ineq$incshare10, 1))
#> [1] NA 0.3198 0.3205 0.3198 0.3182 0.3151
head(dshift(ineq$incshare10))
#> [1] NA 0.0007 0.0007 0.0016 0.0031 0.0024
So the syntax for the simple model described would be easy to write. But notice the problem
summary(lm(diff(ineq$concern) ~ lshift(ineq$concern, 1) + lshift(ineq$incshare10, 1) + dshift(ineq$incshare10) + dshift(ineq$urate)))
#> Error in model.frame.default(formula = diff(ineq$concern) ~ lshift(ineq$concern, : variable lengths differ (found for 'lshift(ineq$concern, 1)')
Introducing the lag changed the variable lengths, and we probably don’t want to have to think about time series operation in our model specification, anyway. Other programs unified this operation by introducing a common syntax, like d. for differencing and l. for lagging, or even l.d. for lag differences (what program could that be?). In R, though, it remains more of a nuisance than it should be. The dynardl
command helps to remedy this challenge by encouraging the user only to think about model structure in exactly the way outlined above: which variables \(x\) get lags \(l\) and lagged differences \(q\), etc. The command expects a basic formula of all the variables in the model, the data set (data
), if there is one, then lagged levels (lags
) and lagged differences (lagdiffs
) as a list and differences (diffs
) and contemporaneous levels (levels
) as a vector. The sole exception is if the user wants to run the model in differences, s/he needs to specify ec = TRUE
(in an effort to force us to think critically about the errorcorrection form of the model). For instance, our above model now becomes (ignoring the simulate
option for now)
dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
ec = TRUE, simulate = FALSE)
Purely hypothetically, if we wanted more lagged levels of concern
, we would just change 1
to c(1:5)
for lags at \(t1\) to \(t5\), or any other number of lags. If we wanted to include the firstdifference of urate
lagged at periods \(t1\) and \(t2\), we would now run
dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("urate" = c(1, 2)),
ec = TRUE, simulate = FALSE)
If the variable can appear in multiple periods (lags
and lagdiffs
), it must be specified as a list. If it can only appear contemporaneously (levels
and diffs
), it must be specified as a vector. (The alternative was to specify levels
through a lag
at time 0: this did not seem practical.) We can also add or suppress a constant from the model by specifying constant = TRUE/FALSE
, and we can do the same with a linear trend term with trend = TRUE/FALSE
(by default, constants are included and trends are not).
The option ec
specifies the nature of the dependent variable. If it is possibly errorcorrecting (ec = TRUE
), it is run in differences, or periodoverperiod changes. If it is not (ec = FALSE
), the dependent variable is run in levels.
At this point, dynardl
is just a unifying way of thinking about time series models for estimation. Yet it is going to offer two other important advantages: interpreting the effects of our variables, and executing a powerful test for cointegration. We will start with the latter. Remember testing for I(1) processes earlier: each test indicated that concern
was I(1). Running the same tests for each of incshare10
and urate
suggests that all three series are I(1). This might cause us concern about cointegration: the special relationship between I(1) series where the series are in a longrun relationship, even if they move apart in the short term. There is nothing wrong with cointegrating series; they are just more difficult to estimate. More importantly, they are difficult to uncover. Traditional methods, like the Engle and Granger (1987) twostep method or likelihoodbased approaches (Johansen 1995) too often conclude cointegration when it does not exist. An alternative test (Pesaran, Shin, and Smith 2001), which we refer to as the ARDLbounds procedure, performs much better in small samples (\(t < 80\)), but it is more cumbersome to implement. The package dynamac is meant to resolve that deficiency by implementing critical value testing procedure for the user. Following Philips (2018), it requires estimating the relationship in errorcorrection form, obtaining statistics of interest, and then testing them via the function pssbounds
.
pssbounds
The ARDLbounds procedure begins with two preliminaries. First, we must ensure that the regressors are not of order I(2) or more and that any seasonal components have been removed. We demonstrated this above when we found that the firstdifference of incshare10
and urate
were both stationary. In addition, there were no seasonal components, looking at the series. Second, we must ensure that the dependent variable is integrated of order I(1). And again, above, we demonstrated that incshare10
is integrated.
The next step in ARDLbounds is to estimate the errorcorrection form of the model. Two points are key: the independent variables that are potentially I(1) must be entered in levels, and the resulting residuals of the errorcorrection ARDL model must be white noise (random with no residual autocorrelation). Here, that means that we run our dependent variable in differences, we include the lagged levels of the dependent variable, we include the levels of the potentially errorcorrecting variable, incshare10
, and the shortrun effects of urate
through differences. Returning to our model above, and using dynardl
, this is now straightforward:
res1 < dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
ec = TRUE, simulate = FALSE)
#> [1] "Error correction (EC) specified; dependent variable to be run in differences."
summary(res1)
#>
#> Call:
#> lm(formula = as.formula(paste(paste(dvnamelist), "~", paste(colnames(IVs),
#> collapse = "+"), collapse = " ")))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> 0.025848 0.005293 0.000692 0.006589 0.031563
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>t)
#> (Intercept) 0.122043 0.027967 4.364 7.87e05 ***
#> l.1.concern 0.167655 0.048701 3.443 0.0013 **
#> d.1.incshare10 0.800585 0.296620 2.699 0.0099 **
#> d.1.urate 0.001118 0.001699 0.658 0.5138
#> l.1.incshare10 0.068028 0.031834 2.137 0.0383 *
#> 
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.01169 on 43 degrees of freedom
#> (1 observation deleted due to missingness)
#> Multiple Rsquared: 0.3671, Adjusted Rsquared: 0.3083
#> Fstatistic: 6.236 on 4 and 43 DF, pvalue: 0.0004755
Next we need to ensure that the residuals from this model are white noise. To help with this, we also introduce dynardl.auto.correlated
. This expects a dynardl
model and implements a few whitenoiseresidual tests with readable output that reminds of the null hypotheses. Here, we just run
dynardl.auto.correlated(res1)
#>
#> 
#> BreuschGodfrey LM Test
#> Test statistic: 3.704
#> pvalue: 0.054
#> H_0: no autocorrelation up to AR 1
#>
#> 
#> ShapiroWilk Test for Normality
#> Test statistic: 0.965
#> pvalue: 0.158
#> H_0: residuals are distributed normal
#>
#> 
#> Loglikelihood: 148.094
#> AIC: 284.187
#> BIC: 272.96
#> Note: AIC and BIC calculated with k = 5 on T = 48 observations.
#>
#> 
#> BreuschGodfrey test indicates we reject the null hypothesis of no autocorrelation at p < 0.10.
#> Add lags to remove autocorrelation before running dynardl simulations.
At a reasonable level of significance (\(p < 0.10\)), the BG test indicates that we reject the null hypothesis of no autocorrelation in the residuals of the model in res
. Philips (2018) indicates that the next step is to add lagged firstdifferences to our model. To add a lagged difference of \(y\), we would run
res2 < dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = FALSE)
#> [1] "Error correction (EC) specified; dependent variable to be run in differences."
summary(res2)
#>
#> Call:
#> lm(formula = as.formula(paste(paste(dvnamelist), "~", paste(colnames(IVs),
#> collapse = "+"), collapse = " ")))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> 0.027161 0.006046 0.001101 0.007727 0.026620
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>t)
#> (Intercept) 0.145837 0.031144 4.683 3.09e05 ***
#> l.1.concern 0.195132 0.052971 3.684 0.000665 ***
#> ld.1.concern 0.195748 0.131225 1.492 0.143434
#> d.1.incshare10 0.679022 0.302936 2.241 0.030471 *
#> d.1.urate 0.001068 0.001665 0.641 0.524878
#> l.1.incshare10 0.085775 0.032804 2.615 0.012434 *
#> 
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.01144 on 41 degrees of freedom
#> (2 observations deleted due to missingness)
#> Multiple Rsquared: 0.4174, Adjusted Rsquared: 0.3464
#> Fstatistic: 5.875 on 5 and 41 DF, pvalue: 0.0003507
and then again test for autocorrelation.
dynardl.auto.correlated(res2)
#>
#> 
#> BreuschGodfrey LM Test
#> Test statistic: 0.453
#> pvalue: 0.501
#> H_0: no autocorrelation up to AR 1
#>
#> 
#> ShapiroWilk Test for Normality
#> Test statistic: 0.986
#> pvalue: 0.857
#> H_0: residuals are distributed normal
#>
#> 
#> Loglikelihood: 146.636
#> AIC: 279.271
#> BIC: 266.32
#> Note: AIC and BIC calculated with k = 6 on T = 47 observations.
#>
#> 
length(res2$model$residuals)
#> [1] 47
We can now be much more confident that there is no more autocorrelation in the residuals. At this point, we can execute the ARDLbounds test procedure. We need a few pieces of information: the length of the time series, the \(t\)statistic on the lagged dependent variable in the ARDL errorcorrection model, the “case” of the regression (a combination of whether the intercept, if any, and trend, if any, were restricted (Pesaran, Shin, and Smith 2001)), and the number of regressors \(k\) appearing in first lags (NOT including the lagged dependent variable). We also need the number of observations in the model time series and the \(F\)statistic on the restriction that the first lags of each of the variables in the model are jointly equal to zero.
From our regression, we know the \(t\)statistic on the lagged dependent variable is 3.684 (just looking at the output). Additionally, we estimated a model with an unrestricted intercept and no trend, which happens to be case 3 (we know: we’ll return to this in a moment). There are \(k = 1\) variables in first lags (incshare10
), and the number of obs
in the model is 47 periods. We now only need the test of the restriction that the first lags are equal to zero. We can calculate this by hand through coefficient testing. If we have all of the coefficients, we just need to compare them to the set that are lagged levels:
coef(res2$model)
#> (Intercept) l.1.concern ld.1.concern d.1.incshare10 d.1.urate
#> 0.145837457 0.195131693 0.195747775 0.679022362 0.001067503
#> l.1.incshare10
#> 0.085775490
# Only the last coefficient is a first lagged level
B < coef(res2$model)
V < vcov(res2$model)
R < matrix(c(0, 0, 0, 0, 0, 1), nrow = 1)
k < sum(R)
# Restriction is that it is equal to 0
q < 0
fstat < (1/k)*t(R%*%Bq)%*%solve(R%*%V%*%t(R))%*%(R%*%Bq)
fstat
#> [,1]
#> [1,] 6.837322
Now we’re ready for pssbounds
pssbounds(obs = 47, fstat = 6.837322, tstat = 3.684, case = 3, k = 1)
#>
#> PESARAN, SHIN AND SMITH (2001) COINTEGRATION TEST
#>
#> Observations: 47
#> Number of Regressors (k): 1
#> Case: 3
#>
#> 
#>  Ftest 
#> 
#> < I(0)  I(1) >
#> 10% critical value 4.19 4.94
#> 5% critical value 5.22 6.07
#> 1% critical value 7.56 8.685
#>
#>
#> Fstatistic = 6.837
#> 
#>  ttest 
#> 
#> < I(0)  I(1) >
#> 10% critical value 2.57 2.91
#> 5% critical value 2.86 3.22
#> 1% critical value 3.43 3.82
#>
#>
#> t statistic = 3.684
#> 
#>
#> tstatistic note: Smallsample critical values not provided for Case III. Asymptotic critical values used.
Finally, a payoff. The \(t\)statistic and \(F\)statistic are situated between special sample critical values. Depending on the level of significance that we preselected, these values offer a number of different conclusions. If the \(F\)statistic exceeds the upper I(1) critical value, we may conclude cointegration. Thus, we can be confident that we have a wellspecified model. If the Fstatistic falls below the I(0) critical value, Pesaran, Shin and Smith note that this indicates that all regressors are in fact stationary. Thus, cointegration cannot exist. If the Fstatistic falls between the lower I(0) and upper I(1) critical values, the results are inconclusive. This means that cointegration may exist, but further testing and respecification of the model is needed. Users that end up with these results are encouraged to see Philips (2018), who provides a strategy for dealing with this.
In our model here, since the value of the \(F\)statistic exceeds the critical value at the upper I(1) bound of the test at the 5% level, we may conclude that Income Top 10 (the variable in the model in levels) and the dependent variable, Concern, are in a cointegrating relationship. This is furthered by the \(t\)statistic of 3.684 falling below the 5% critical value for I(1). Thus, we can be confident both in our model specification and the unique, longrun relationship that exists between the two variables.
Calculating all of those values by hand was unnecessarily involved, especially since we know most of the values are saved postestimation anyway. Thus, our preferred way of estimating the ARDLbounds test is just by passing the model to pssbounds
. In other words, the test is equivalent by just running
pssbounds(res2)
#>
#> PESARAN, SHIN AND SMITH (2001) COINTEGRATION TEST
#>
#> Observations: 47
#> Number of Regressors (k): 1
#> Case: 3
#>
#> 
#>  Ftest 
#> 
#> < I(0)  I(1) >
#> 10% critical value 4.19 4.94
#> 5% critical value 5.22 6.07
#> 1% critical value 7.56 8.685
#>
#>
#> Fstatistic = 12.204
#> 
#>  ttest 
#> 
#> < I(0)  I(1) >
#> 10% critical value 2.57 2.91
#> 5% critical value 2.86 3.22
#> 1% critical value 3.43 3.82
#>
#>
#> t statistic = 3.684
#> 
#>
#> tstatistic note: Smallsample critical values not provided for Case III. Asymptotic critical values used.
Any dynardl
object run in an errorcorrection format is available for pssbounds
postestimation. This, of course, is not meant to imply that all dynardl
models are meant to be tested for cointegration. Stationary variables cannot be cointegrated with other variables. As such, we would want to run those models in levels, with EC = FALSE
. Like always, we point users to the importance of preregression testing of the nature of our variables, most specifically whether or not they are integrated. dynardl
will attempt to be as helpful as possible, but in the end, only the user knows which model is appropriate and specifies that model correctly.
We had another motivation. dynardl
+ pssbounds
are a powerful combination for estimating and testing for cointegration in a unified framework. But once we have tested for cointegration, we still want inferences about the nature of the effect of some \(x\) variable on the dependent variable. And these inferences become much more difficult to obtain as our models increase in complexity. In the next section, we elaborate on the ability of dynardl
to provide these inferences through the process we outlined earlier: counterfactual simulation of a response to a shock in some \(x\) variable. These inferences do not require anything beyond what we have already covered: a dynardl
model and a theoretically interesting \(x\) variable.
dynardl
ARDL models are useful because they are flexible, but their flexibility undermines our ability to make cumulative sense of the coefficients estimated in any given model. An alternative approach to interpretation is to use the coefficients from an estimated model to simulate meaningful responses in the dependent variable to counterfactual changes in an independent variable \(x\) (that we control), allowing the change to filter through the various forms of the \(x\) variable in the model, as well as different forms of the \(y\) variable (like differences and lagged levels) that might be included.
dynamac handles this simulated interpretation through a function we have already seen: dynardl
. All we need to do is specify that simulate = TRUE
, and we will produce simulated responses: we can observe how a change to a variable in a dynardl
model produces a corresponding change in the dependent variable. Other arguments are required, but only one has no default: we need to know which \(x\) variable to simulate a shock to (the shockvar
). This “shock” means that, at a time specified by the user, the value of an \(x\) variable will move to some level. If the variable is in levels or lagged levels, this means that its new value becomes the preshock average plus whatever the shock value is. If the variable is in differences or lagged differences, the shock lasts for one period (as a permanent change in a differenced variable would imply that it is changing every period!).
dynardl
has reasonable defaults for all of the other required parameters: simulations default to a range
of 20 periods, lines are drawn at roughly sig
= 95% confidence, the shockvar
is shocked by a standard deviation (the shockval
), we use 1,000 sims
to simulate the average response, we don’t force the other \(x\) variables to any values (we allow them to take their means, except for differenced variables, which we set to be zero: assuming that, periodoverperiod, there is no change), we allow for 20 periods of burnin
, and we create predicted values of the dependent variable, rather than expected values. All of these options can be controlled by the user, but we’ll return to that in a moment.
The simulation process is fairly straightforward. dynardl
uses the coefficients from the model. It draws a number (specifically, sims
) of values of the coefficients \(\hat\beta\) from the estimated regression model. The distribution is assumed to be multivariate normal with mean of \(\hat\beta\) and variancecovariance of the estimated model. Uncertainty is incorporated by simulating \(\hat\sigma\) \(^{2}\) as a scaled draw from the chisquared distribution. These fit together by using values of the independent variables (usually their means: see the preceding paragraph) \(X\), multiplying by \(\hat\beta\) to get predicted values of the dependent variable \(y\), and then using \(\hat \sigma\) \(^{2}\) to introduce uncertainty in the predicted values. If you want to exert more direct control over the values of any \(x\) variables used, you can forceset
them to any other value you like. This is not limited to means or integers; if you have any substantively interesting value you’d like to hold a variable at, you are free to specify whichever value you like.
But what we’re really interested in is the effect of some variable \(x\). In the world of dynardl
, this is our shockvar
. We specify a variable from the model, tell dynardl
how much we want it to theoretically change by (in shockval
, defaulting to a standard deviation of the shockvar
) and when we want it to change (time
), and then observe the change in \(y\).
Let’s go back to our earlier model. Remember we ran
res2 < dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = FALSE)
summary(res2)
Here, we set simulate = FALSE
because we were just illustrating estimation without simulations (given that simulations take a few seconds to estimate, we may only want to simulate changes when we have a final model, free of autocorrelation and the other things we tested for). Now, we’ll turn simulate = TRUE
and specify an \(x\) variable to observe the response of. We’ll observe the changes resulting from incshare10
, given its lagged level demonstrated a significant effect. In other words, our shockvar = "incshare10"
. By default, dynardl
will shock it by its standard deviation, but we can observe any change we like with shockval
.
res2 < dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = TRUE,
shockvar = "incshare10")
#> [1] "Error correction (EC) specified; dependent variable to be run in differences."
#> [1] "incshare10 shocked by one standard deviation of incshare10 by default."
#> [1] "dynardl estimating ..."
#>

  0%

===  5%

=====  8%

======  10%

========  12%

==========  15%

===========  18%

=============  20%

===============  22%

================  25%

==================  28%

====================  30%

=====================  32%

=======================  35%

========================  38%

==========================  40%

============================  42%

=============================  45%

===============================  48%

================================  50%

==================================  52%

====================================  55%

=====================================  58%

=======================================  60%

=========================================  62%

==========================================  65%

============================================  68%

==============================================  70%

===============================================  72%

=================================================  75%

==================================================  78%

====================================================  80%

======================================================  82%

=======================================================  85%

=========================================================  88%

==========================================================  90%

============================================================  92%

==============================================================  95%

===============================================================  98%

================================================================= 100%
summary(res2$model)
#>
#> Call:
#> lm(formula = as.formula(paste(paste(dvnamelist), "~", paste(colnames(IVs),
#> collapse = "+"), collapse = " ")))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> 0.027161 0.006046 0.001101 0.007727 0.026620
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>t)
#> (Intercept) 0.145837 0.031144 4.683 3.09e05 ***
#> l.1.concern 0.195132 0.052971 3.684 0.000665 ***
#> ld.1.concern 0.195748 0.131225 1.492 0.143434
#> d.1.incshare10 0.679022 0.302936 2.241 0.030471 *
#> l.1.incshare10 0.085775 0.032804 2.615 0.012434 *
#> d.1.urate 0.001068 0.001665 0.641 0.524878
#> 
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.01144 on 41 degrees of freedom
#> (2 observations deleted due to missingness)
#> Multiple Rsquared: 0.4174, Adjusted Rsquared: 0.3464
#> Fstatistic: 5.875 on 5 and 41 DF, pvalue: 0.0003507
It looks somewhat goofy in the vignette, as dynardl
displays a progress bar to inform you of how many simulations have been completed. But the payoff is in res2$simulation
. We get to observe the effect of a standarddeviation shock in incshare10
in the dependent variable. This response is best visualized in a plot. To see this plot, the model can be saved to an object, and plots can be created with dynardl.simulation.plot
.
dynardl.simulation.plot(res2, type = "area", response = "levels")
Here, the shock has an immediate effect that does not dissipate for a long time. So long, in fact, that we might want to lengthen the range
of the simulation.
res3 < dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = TRUE, range = 30,
shockvar = "incshare10")
#> [1] "Error correction (EC) specified; dependent variable to be run in differences."
#> [1] "incshare10 shocked by one standard deviation of incshare10 by default."
#> [1] "dynardl estimating ..."
#>

  0%

===  4%

====  6%

=====  8%

======  10%

========  12%

=========  14%

==========  16%

============  18%

=============  20%

==============  22%

================  24%

=================  26%

==================  28%

====================  30%

=====================  32%

======================  34%

=======================  36%

=========================  38%

==========================  40%

===========================  42%

=============================  44%

==============================  46%

===============================  48%

================================  50%

==================================  52%

===================================  54%

====================================  56%

======================================  58%

=======================================  60%

========================================  62%

==========================================  64%

===========================================  66%

============================================  68%

==============================================  70%

===============================================  72%

================================================  74%

=================================================  76%

===================================================  78%

====================================================  80%

=====================================================  82%

=======================================================  84%

========================================================  86%

=========================================================  88%

==========================================================  90%

============================================================  92%

=============================================================  94%

==============================================================  96%

================================================================  98%

================================================================= 100%
dynardl.simulation.plot(res3, type = "area", response = "levels")
dynardl.simulation.plot(res3, type = "spike", response = "levels")
In res3
there are actually three sets of output. The first is the model, the second is the information for pssbounds
, and the last is for plotting. We mention this so that users can use their usual TeX shortcuts for creating tables, customize plots, or test pssbounds later.
res3$model
#>
#> Call:
#> lm(formula = as.formula(paste(paste(dvnamelist), "~", paste(colnames(IVs),
#> collapse = "+"), collapse = " ")))
#>
#> Coefficients:
#> (Intercept) l.1.concern ld.1.concern d.1.incshare10
#> 0.145837 0.195132 0.195748 0.679022
#> l.1.incshare10 d.1.urate
#> 0.085775 0.001068
res3$pssbounds
#> obs k tstat fstat case
#> 1 47 1 3.683747 12.20351 3
res3$simulation
#> time central ll95 ll90 ll75 ul75 ul90
#> 1 1 0.5794767 0.5555065 0.5594187 0.5654940 0.5938743 0.6001877
#> 2 2 0.5788851 0.5560116 0.5592635 0.5647903 0.5926511 0.5981659
#> 3 3 0.5788756 0.5544378 0.5594042 0.5658421 0.5924643 0.5982560
#> 4 4 0.5792519 0.5540245 0.5584665 0.5648383 0.5934459 0.5999197
#> 5 5 0.5797144 0.5558411 0.5597849 0.5654411 0.5935222 0.5999085
#> 6 6 0.5801368 0.5576851 0.5608233 0.5655160 0.5940250 0.6002017
#> 7 7 0.5805170 0.5572423 0.5608670 0.5674000 0.5931397 0.5997907
#> 8 8 0.5800098 0.5562504 0.5599846 0.5663659 0.5941620 0.6018511
#> 9 9 0.5807871 0.5562658 0.5607199 0.5660883 0.5941517 0.6011490
#> 10 10 0.6177245 0.5803872 0.5846411 0.5938762 0.6403027 0.6524127
#> 11 11 0.5983338 0.5717285 0.5754910 0.5832282 0.6133641 0.6195227
#> 12 12 0.5942016 0.5693608 0.5742775 0.5801036 0.6081156 0.6141539
#> 13 13 0.5870353 0.5645176 0.5677791 0.5738063 0.6011466 0.6074455
#> 14 14 0.5823206 0.5584486 0.5624044 0.5677795 0.5965670 0.6024216
#> 15 15 0.5776844 0.5532430 0.5578371 0.5645989 0.5919450 0.5985151
#> 16 16 0.5734155 0.5510582 0.5540105 0.5596456 0.5877269 0.5933247
#> 17 17 0.5704866 0.5467544 0.5514823 0.5573510 0.5840438 0.5886174
#> 18 18 0.5684356 0.5439032 0.5477649 0.5549472 0.5829990 0.5892947
#> 19 19 0.5659084 0.5423248 0.5462060 0.5519173 0.5797636 0.5854630
#> 20 20 0.5638772 0.5408137 0.5439559 0.5502624 0.5776417 0.5846955
#> 21 21 0.5627201 0.5390736 0.5435235 0.5486617 0.5762680 0.5822163
#> 22 22 0.5609477 0.5384585 0.5412887 0.5478207 0.5744375 0.5805584
#> 23 23 0.5600664 0.5364928 0.5411544 0.5464902 0.5745412 0.5799471
#> 24 24 0.5592720 0.5345447 0.5383505 0.5450392 0.5732651 0.5788272
#> 25 25 0.5583611 0.5349060 0.5381765 0.5448945 0.5720599 0.5769109
#> 26 26 0.5579199 0.5350668 0.5384289 0.5442226 0.5714737 0.5785907
#> 27 27 0.5578897 0.5360311 0.5392122 0.5444253 0.5717208 0.5777534
#> 28 28 0.5572270 0.5321589 0.5363469 0.5431036 0.5709652 0.5762664
#> 29 29 0.5569174 0.5326767 0.5368865 0.5438637 0.5702943 0.5764032
#> 30 30 0.5563050 0.5333968 0.5377171 0.5436316 0.5691705 0.5752497
#> ul95 d.central d.ll95 d.ll90 d.ll75
#> 1 0.6044906 5.797285e05 0.03258869 0.028064377 0.021097636
#> 2 0.6037172 6.425034e04 0.03461479 0.028778283 0.020318574
#> 3 0.6030814 5.821833e04 0.03389689 0.027770919 0.019004854
#> 4 0.6049648 3.856628e04 0.03531626 0.028070154 0.018819091
#> 5 0.6040770 8.637623e05 0.03414811 0.028386181 0.018828535
#> 6 0.6043737 4.023303e05 0.03395306 0.027798772 0.019383603
#> 7 0.6043751 4.216895e05 0.03279307 0.026750429 0.018355350
#> 8 0.6050736 8.873955e04 0.03433963 0.030158271 0.021232120
#> 9 0.6050755 1.284541e03 0.03597918 0.028094879 0.017935203
#> 10 0.6581088 3.616006e02 0.01084113 0.003430116 0.007353280
#> 11 0.6231753 5.632801e02 0.10406899 0.096345981 0.084750201
#> 12 0.6183009 1.525842e02 0.02032847 0.014947521 0.005887126
#> 13 0.6112684 3.034129e03 0.03487783 0.030912310 0.021294125
#> 14 0.6070564 2.451642e03 0.02794129 0.023367788 0.016059534
#> 15 0.6027390 7.847237e05 0.03255500 0.027366690 0.018514467
#> 16 0.5978729 3.673088e04 0.03428801 0.027765790 0.019449610
#> 17 0.5931405 1.340052e03 0.03122489 0.025401068 0.018347864
#> 18 0.5935402 8.778637e04 0.03119466 0.026910303 0.019084653
#> 19 0.5898168 4.761831e04 0.03443206 0.027858836 0.019383682
#> 20 0.5885254 4.960031e04 0.03053660 0.025721629 0.018356408
#> 21 0.5866523 8.740207e04 0.03265458 0.026749517 0.017793182
#> 22 0.5849789 6.152305e04 0.03130801 0.026549257 0.019790169
#> 23 0.5843374 8.910834e04 0.03263661 0.026185292 0.018036618
#> 24 0.5833096 8.693981e05 0.03345008 0.029182061 0.020631451
#> 25 0.5809242 1.165660e04 0.03275007 0.026913718 0.018532083
#> 26 0.5834067 4.697206e04 0.03128653 0.026329988 0.018908312
#> 27 0.5821968 4.110217e04 0.03301781 0.027072325 0.017601742
#> 28 0.5807487 6.325209e04 0.03264493 0.028102777 0.020364485
#> 29 0.5804894 3.531282e04 0.03316408 0.026526020 0.017846205
#> 30 0.5796326 3.027876e04 0.03198880 0.025850079 0.018377681
#> d.ul75 d.ul90 d.ul95 shocktime
#> 1 0.01947088 0.02750265 0.03338453 10
#> 2 0.01930720 0.02751583 0.03206043 10
#> 3 0.01995676 0.02895880 0.03454512 10
#> 4 0.01883522 0.02800036 0.03394754 10
#> 5 0.01967528 0.02710586 0.03299526 10
#> 6 0.02014378 0.02839193 0.03374534 10
#> 7 0.01819187 0.02613426 0.03245535 10
#> 8 0.01879549 0.02675922 0.03295315 10
#> 9 0.02139057 0.02932508 0.03592252 10
#> 10 0.06514801 0.07597122 0.08431512 10
#> 11 0.02764855 0.01759594 0.01317634 10
#> 12 0.03623831 0.04636788 0.05140281 10
#> 13 0.01574474 0.02358902 0.02776624 10
#> 14 0.02218702 0.03110861 0.03553306 10
#> 15 0.01972446 0.02811398 0.03672770 10
#> 16 0.01991452 0.02794540 0.03259613 10
#> 17 0.02049962 0.02874885 0.03249872 10
#> 18 0.02047953 0.02981825 0.03396927 10
#> 19 0.01880285 0.02812521 0.03387522 10
#> 20 0.01907667 0.02873078 0.03473159 10
#> 21 0.01952261 0.02622956 0.03199736 10
#> 22 0.01884640 0.02854075 0.03309345 10
#> 23 0.01990317 0.02722658 0.03328081 10
#> 24 0.02048872 0.02843833 0.03367225 10
#> 25 0.01843965 0.02854605 0.03314586 10
#> 26 0.01902324 0.02945252 0.03273466 10
#> 27 0.01820500 0.02684961 0.03203890 10
#> 28 0.01837958 0.02619682 0.02944270 10
#> 29 0.01906939 0.02758203 0.03159880 10
#> 30 0.01650978 0.02591626 0.03069302 10
Only models where ec = TRUE
will have $pssbounds
information, as only those models can possibly be testing a cointegrating relationship. We suspect that no one will need them by hand, as you can just pass the whole object via pssbounds(res3)
to get the same results.
Other options might be of interest. In order to smooth our confidence intervals (note: NOT make us more confident), we can increase the number of simulations. Additionally, the plotting function allows the user to produce plot in grayscale (for publications), as well as showing deviations from the mean of \(y\) (response = "mean.changes"
), rather than some level (that depends on the averages of the \(x\) variables). Consider the following (notice the sims
argument of dynardl
and the new arguments under dynardl.simulation.plot
, as well as the new yaxis in the following figure):
res4 < dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = TRUE, range = 30, sims = 10000,
shockvar = "incshare10")
#> [1] "Error correction (EC) specified; dependent variable to be run in differences."
#> [1] "incshare10 shocked by one standard deviation of incshare10 by default."
#> [1] "dynardl estimating ..."
#>

  0%

===  4%

====  6%

=====  8%

======  10%

========  12%

=========  14%

==========  16%

============  18%

=============  20%

==============  22%

================  24%

=================  26%

==================  28%

====================  30%

=====================  32%

======================  34%

=======================  36%

=========================  38%

==========================  40%

===========================  42%

=============================  44%

==============================  46%

===============================  48%

================================  50%

==================================  52%

===================================  54%

====================================  56%

======================================  58%

=======================================  60%

========================================  62%

==========================================  64%

===========================================  66%

============================================  68%

==============================================  70%

===============================================  72%

================================================  74%

=================================================  76%

===================================================  78%

====================================================  80%

=====================================================  82%

=======================================================  84%

========================================================  86%

=========================================================  88%

==========================================================  90%

============================================================  92%

=============================================================  94%

==============================================================  96%

================================================================  98%

================================================================= 100%
dynardl.simulation.plot(res4, type = "area", response = "levels.from.mean", bw = TRUE)
The full extent of these options is addressed in the documentation.
Finally, there is some question as to whether or not quantities of interest from these types of stochastic simulations are best summarized by the means or the medians of the resulting distributions. In most cases, the difference is likely to be minimal. In cases of skew, though, the median might serve significantly better than the mean (described in (2017)). Here, we can do that by setting qoi = median
.
res5 < dynardl(concern ~ incshare10 + urate, data = ineq,
lags = list("concern" = 1, "incshare10" = 1),
diffs = c("incshare10", "urate"),
lagdiffs = list("concern" = 1),
ec = TRUE, simulate = TRUE, range = 30,
shockvar = "incshare10", qoi = "median")
#> [1] "Error correction (EC) specified; dependent variable to be run in differences."
#> [1] "incshare10 shocked by one standard deviation of incshare10 by default."
#> [1] "dynardl estimating ..."
#>

  0%

===  4%

====  6%

=====  8%

======  10%

========  12%

=========  14%

==========  16%

============  18%

=============  20%

==============  22%

================  24%

=================  26%

==================  28%

====================  30%

=====================  32%

======================  34%

=======================  36%

=========================  38%

==========================  40%

===========================  42%

=============================  44%

==============================  46%

===============================  48%

================================  50%

==================================  52%

===================================  54%

====================================  56%

======================================  58%

=======================================  60%

========================================  62%

==========================================  64%

===========================================  66%

============================================  68%

==============================================  70%

===============================================  72%

================================================  74%

=================================================  76%

===================================================  78%

====================================================  80%

=====================================================  82%

=======================================================  84%

========================================================  86%

=========================================================  88%

==========================================================  90%

============================================================  92%

=============================================================  94%

==============================================================  96%

================================================================  98%

================================================================= 100%
dynardl.simulation.plot(res5, type = "area", response = "levels")
dynardl
) about data and modelingdynamac is meant to be a unifying package for estimating and interpreting times series ARDL models. It is not, though, a data manager. It assumes that your data are ordered, that the progression between time series makes sense (i.e. there is a consistent unit of time separating the ordered observations), that there are not problematic missing values, and other usual preestimation data considerations. Users should know and explore their datasets well before passing those data to any statistical software.
Nor will dynamac stop you from running “bad idea” models. Users should be careful about the order of integration in their variables, whether seasonal unit roots are present, if variables have nuanced laggeddifference structures, and so on. We offer commands (like dynardl.auto.correlated
) to help users make these decisions on their path to a final model. But care must be taken at every step.
Engle, Robert F, and Clive WJ Granger. 1987. “CoIntegration and Error Correction: Representation, Estimation, and Testing.” Econometrica 55 (2). JSTOR: 251–76.
Johansen, Soren. 1995. LikelihoodBased Inference in Cointegrated Vector Autoregressive Models. Oxford University Press.
Pesaran, M Hashem, Yongcheol Shin, and Richard J Smith. 2001. “Bounds Testing Approaches to the Analysis of Level Relationships.” Journal of Applied Econometrics 16 (3). Wiley Online Library: 289–326. https://doi.org/10.1002/jae.616.
Pfaff, Bernhard, Eric Zivot, and Matthieu Stigler. 2016. Urca: Unit Root and Cointegration Tests for Time Series Data. https://CRAN.Rproject.org/package=urca.
Philips, Andrew Q. 2018. “Have Your Cake and Eat It Too? Cointegration and Dynamic Inference from Autoregressive Distributed Lag Models.” American Journal of Political Science 62 (1): 230–44. https://doi.org/10.1111/ajps.12318.
Rainey, Carlisle. 2017. “TransformationInduced Bias: Unbiased Coefficients Do Not Imply Unbiased Quantities of Interest.” Political Analysis 25 (3). Cambridge University Press: 402–9. https://doi.org/10.1017/pan.2017.11.
Wright, Graham. 2017. “The Political Implications of American Concerns About Economic Inequality.” Political Behavior. Springer, 1–23. https://doi.org/10.1007/s1110901793993.