21.1 Activity
1. Conduct a Exploratory Data Analysis
- Classify the variables in the dataset. Using descriptive statistics, what is data telling us?
- Make a histogram and box plot for \(Y\). Are there any outliers? What is the IQR?
- Create a new variable considering age:
\[ X_{11} = \begin{cases} 1 & \text{if } 21 \leq \text{ age } \leq 30,\\ 2 & \text{if } 31 \leq \text{ age } \leq 40, \\ 3 & \text{if age} > 40 \end{cases} \]
- Make a comparison of systolic blood pressure among Age ranges (hint: box-plot)
- Consider two groups of people: (1) Weight \(\leq 65\) and (2) Weight \(> 65\). Do you observe significant differences between the systolic blood pressure between them? (hint: test on the mean).
- Considering the two groups of Weight defined before, are the forearm skin fold statistically different?
2. Perform a detailed analysis of the following multiple regression model
\[Y_i = \beta_0 + \beta_1 X_{1i} + \beta_1 X_{2i} + \ldots + \beta_9X_{9i} + \epsilon_i\]
where
- \(Y\) = systolic blood pressure
- \(X_1\) = age
- \(X_2\) = years in urban area
- \(X_3\) = \(\dfrac{X_2}{X_1}\) = fraction of life in urban area
- \(X_4\) = weight (kg)
- \(X_5\) = height (mm)
- \(X_6\) = chin skinfold
- \(X_7\) = forearm skinfold
- \(X_8\) = calf skinfold
- \(X_9\) = resting pulse rate
3. Considering the results, answer these questions
- What is the interpretation of the intercept?
- What are the interpretation of the coefficients of \(X_4\) and \(X_9\)?
- Interpret the \(R^2\)
- Can we assume that residuals are normally distributed?
- Can we assume that all the parameters are equal to zero? (hint: test for all variables)
- Can we assume that \(\beta_1\) and \(\beta_8\) are statistically significant? (hint: tests for individual variables)
- Do you recommend the full model to estimate \(Y\)? Why?
- Before fitting new models, which model do you recommend to analyze afterwards?
4. Working with regressors
- Make de XY (matrix) scatterplot with Height, Chin, Forearm, Calf, and Pulse
- Can we state that multicollinearity is not a problem in this case? Why?
- Do you recommend to transform any of the variables? How?
- Do you recommend to remove any of the variables from the analysis? Why?
5. Final model
- Based on the previous results, fit and present a model that you recommend to estimate \(Y\) (hint: all parameters are significant). Why do you select this model?
- Diagnosis the model (\(R^2\), \(F-\)test, \(t-\)test, residuals)
- Simulate three cases (3 new people). Use this information to estimate their \(Y\)
6. Conclusions
- Present you final comments and conclusions on the overall Project.