Multicollinearity is the strong linear dependency relationship between more than two explanatory variables in a multiple regression that violates the Gauss-Markov assumption when it is exact.
In other words, multicollinearity is the high correlation between more than two explanatory variables.
We emphasize that the linear relationship (correlation) between explanatory variables has to be strong. It is very common for the explanatory variables of the regression to be correlated. So, it should be pointed out that this relationship must be strong, but never perfect, for it to be considered a case of multicollinearity. The linear relationship would be perfect if the correlation coefficient were 1.
When this strong linear (but not perfect) relationship occurs only between two explanatory variables, we say that it is a case of collinearity. It would be multicollinearity when the strong linear relationship occurs between more than two independent variables.
The Gauss-Markov assumption about exact non-multicollinearity defines that the explanatory variables in a sample cannot be constant. Furthermore, there must be no exact linear relationships between explanatory variables (no exact multicollinearity). Gauss-Markov does not allow us exact multicollinearity, but approximates multicollinearity.Regression analysis
There are very particular cases, usually unrealistic, in which the regression variables are completely unrelated to each other. In these cases we speak of exogeneity of the explanatory variables. The social sciences are generally famous for incorporating approximate multicollinearity into their regressions.
Exact multicollinearity occurs when more than two independent variables are a linear combination of other independent variables in the regression.
When Gauss Markov forbids exact multicollinearity it is because we cannot obtain the estimator of Ordinary Least Squares (OLS).
Mathematically expressing the estimated beta sub-i in matrix form:
So if there is exact multicollinearity, it causes the matrix (X'X) to have a determinant 0 and, therefore, not be invertible. Not being invertible implies not being able to calculate (X'X) -1 and consequently neither estimated Beta sub-i.
Approximate multicollinearity occurs when more than two independent variables are not exactly (approximation) a linear combination of other independent variables in the regression.
The variable k represents a random variable (independent and identically distributed (i.i.d)). The frequency of your observations can be satisfactorily approximated to a standard Normal distribution with mean 0 and variance 1. Since it is a random variable, it implies that in each observation i, the value of k will be different and independent of any previous value.
Mathematically expressing in matrix form:
So if there is approximate multicollinearity, it causes the matrix (X'X) to be approximately 0 and the coefficient of determination very close to 1.
Multicollinearity can be reduced by eliminating the regressors of the variables with a high linear relationship between them.
Linear correlation coefficient