17.4 R-squared, and Adjusted R-squared
- \(R^2\) is the Coeficient of Determination. It represents the proportion of variation in \(y\) (about its mean) explained by the multiple linear regression model with predictors, \(x_1,x_2, \ldots\)
\[R^2=\frac{SSR}{SSTO}=1-\frac{SSE}{SSTO}\] - \(R^2\) always increases (or stays the same) as more predictors are added to a multiple linear regression model, even if the predictors added are unrelated to the response variable. Thus, by itself, \(R^2\) cannot be used to help us identify which predictors should be included in a model and which should be excluded.
\[\text{Adjusted-} R^2 = 1-\left(\frac{n-1}{n-(k+1)}\right)(1-R^2)\]
\(\text{Adjusted-} R^2\), does not necessarily increase as more predictors are added, and can be used to help us identify which predictors should be included in a model and which should be excluded. Simply stated, when comparing two models used to predict the same response variable, we generally prefer the model with the higher value of \(\text{Adjusted-}R^2.\)