When we run regression, we hope to be able to generalize the sample model to the entire population. To do so we have to meet several assumptions of the multiple linear regression model. If we are violating these assumptions it stops our generalizing conclusions to our target population because the results might be biased or misleading, so what are the assumptions ? how do we check them ?

## Multiple Linear Regression Assumptions

#### Variable Type

This is a straightforward assumption. In looking at the variable type our outcome or dependent variable must be a continuous variable. The independent variables can be continuous or dichotomous meaning they only have two categories.

#### Linearity

We’re talking about the outcome or the dependent variable. The parameters should in reality be linearly related to any predictors, which means that we assume that the relationship we’re modeling is a linear one.

#### Normality

This can be checked by having both skewness and kurtosis between -1 and +1 or by looking at a histogram or a Quantile-Quantile Plot. This is a guide to check for normality using QQ-Plot.

#### Multicollinearity

This occurs when the independent variables are overly correlated with each other. We can assess for multicollinearity by looking at the correlation matrix and checking whether there are any predictors that are correlated above 0.7 which means that they share more than 50% of the variance.

We can also check the VIF Variance Inflation Factor to be low ( < 3 to 10 ) or the Tolerance to be ( > 0.1 to 0.3 ).

#### Homoscedasticity

There should not a be a heterogeneous pattern in the scattered residuals.

#### Independent Errors or No Autocorrelation

The assumption of independent errors is that successive residuals should be independent. That means, there’s no pattern to the residuals. The residuals aren’t highly correlated and there are no long runs of positive or negative residuals when the successive residuals are correlated. We refer to this condition as autocorrelation and that frequently occurs when the data are collected over a period of time.