Stationary

The Dickey-Fuller test needs to be done for all dependent and explanatory variables individually. Regressing on non-stationary time series can lead to spurious regression relationship between dependent and explanatory variables. Test for Unit Root: \[y_t = \alpha + \rho y_{t-1} + \epsilon_t\]

  • Null Hypothesis: \(H_0: \rho = 1\)
  • Alternative Hypothesis: \(H_1: \rho <1\)

The original equation can also be written as the equation below, so that by testing the significancy of \(y_{t-1}\) (wheher \(\theta = 0\)), we can conclude that whether the time series is unit root: \[\Delta y_t = \alpha + \theta y_{t-1} + \epsilon_t\]

However, when the true \(\theta = 0\), the sampling distribution of \(\theta\) is not normal (Central Limit Theorem no longer applies), and therefore the standard t-test on OLS regression is not valid in this case. The asymptotic distribution of the t statistic is known as Dickey-Fuller distribution. The derivation is rather mathematical intense. The Dickey-Fuller test can be done in R packages.


Auto-correlation

This test needs to be done on the model residauls. The variances of the sampling distribution of \(\beta\) of the OLS is not correct when there is auto-correlation in the residuals. The t test for \(\beta\) is therefore not valid.

  • Durbin-Watson Statistics: The simplest way to diagose auto-correlation in the error term is to plot ACF/PACF test using the R package. Durbin-Watson Statistics can also be used when there is no lag term of dependent variable in the equation.

  • Breusch–Godfrey test:

Homoscedasticity

The test needs to be done on the model residauls. The variances of the sampling distribution of \(\beta\) of the OLS is not correct when there is Heteroscedasticity. The t test for \(\beta\) is therefore not valid.

  • Breusch–Pagan test: For a linear equation \[y = \beta_0 + \beta_1x1+ \beta_2x2+...\beta_kxk + \epsilon\] then
    • \(Var(u|x) = \sigma^2\) if homoskasdasticity
    • \(Var(u|x) = \sigma^2f(x)\) if heteroskasdasticity
    To test it, we regress the estimates of errors’ squares \(\hat{\epsilon}^2\) with the explanatory variables \(x\): \[\hat{\epsilon}^2 = a_0+a_1x_1+a_2x_2+...a_kx_k + e \] Then we use F-test for hypothesis testing:
    • Null Hypothesis \(H_0: a_1 = a_2 = ...a_k = 0\)
    • Alternate Hyphthesis \(H_1: a_i \neq 0\)

Multi-collinearity

When Multi-collinearity exists in the model, the model is not able to distinguish the effects of the variables and therefore the variables of the \(\beta\) sampling distribution will be huge.

  • Model is significant but \(\beta\)s are not:
    The usual symtom could be that the model is significant (p value small for F test) but non of the \(\beta\)s are significant because the model is not able to discern the effect of the variables.
  • Correlation matrix:
    In some case, it can also be observed in the correlation matrix of the explanatory. However, if one variable is closed to a linear combination of 2 or more other variables, it cannot be seen in the correlation matrix.
  • Variance inflation factor (VIF):
    This method address the weakness of only looking at the correlation matrix. For each explanatory variable, regress it on all the other explanatory variables. Then calculate \[VIF_i = \frac{1}{1-R_i^2}\]
    A high VIF indicates Multi-collinearity.

Normality of the Residuals

Recall from Derive OLS Estimaors in Matrix Form we derive that \(\hat{\beta} = \beta+(X^{'}X)^{-1}X^{'} \epsilon\).

  • Quantile-quantile plot (QQ plot): This method is very intuitive. One can plot the normalized values of error for each quantile with the value of standard normal distribution of each quantile. We can get a sense of if there is strong non-normality in the errors by eyeballing it.
  • Jarque–Bera test:
  • Kolmogorov–Smirnov test: