Monday, August 5, 2019
A Hierarchical Regression Analysis Psychology Essay
A Hierarchical Regression Analysis Psychology Essay This study was conducted to determine what the predictors of Body Mass Index are. There were two research questions of this study. First research question was How well the type of chocolate and frequency of chocolate consumption predict body mass index, after controlling for gender physical activity? Second research question was How well do fat percentage and cacao percentage in chocolate explain body mass index, after controlling the results of the first research question? In order to reveal the predictors hierarchical regression analysis was used. In this study BMI was outcome variable; gender, type of chocolate, fat rate in chocolate, cocoa rate in chocolate, frequency of chocolate consumption and frequency of physical activity in a week were predictor variables. The study was conducted with 600 university students. Method Participants and the Variables The sample of the study was consisted of 600 Middle East Technical University students; 46.3% (n=278) were male and 53.7% (n=322) were female. Convenience sampling method was used to determine the participants. The most crowded places of the university, such as library, market area, dormitory area, were selected as data collection areas. Requisite sample size for multiple regression could be calculated with the formula of number of predictors * 8 + 50. According to formula required sample size is 106 (7*8+50). While there are 600 students, sample size is quite enough to conduct multiple regression. The questionnaire used in this study was consisted of seven items which are presented in Table 1. Moreover, there is an id number for each participant. Totally, there were six continuous and two categorical variables on data file. Table 1 List of variables and brief descriptions in the data file Variable Name Description of the variable Id Identity number of each participant BMI Body Mass Index Gender Gender (1: Male; 2: Female) Type Type of chocolate ( 1: Milk; 2: Berry; 3: Peanut) Fat Fat rate (%) in chocolate Cacao Cacao rate (%)in chocolate Frequency Frequency of chocolate consumption (number of chocolates eaten in the last week) Activity Frequency of physical activity in a week Data Analysis Plan In this study hierarchical regression will be held to find out how much the predictors can explain the dependent variable, BMI. In hierarchical regression different models are tested sequentially. In contrast to stepwise regression, researcher decides the sequence of the predictors that included the model. Three different models will be used to determine how much these independent variables predict the dependent variable. In the first model gender and frequency of physical activity in a week will be included into analysis. In the second model, gender and frequency of physical activity in a week will be controlled; type of chocolate and frequency of chocolate consumption will be included into analysis. In the third model, gender, frequency of physical activity in a week, type of chocolate and frequency of chocolate consumption will be controlled, fat percentage and cacao percentage in chocolate will be included into analysis. To conduct the regression analysis, categorical data should be recoded. There are three different ways to do this; dummy coding, effects coding and contrast coding. In this study, dummy coding will be used to recode categorical data. In dummy coding, one categorical variable recode into different variables that the number of new variables are one less than the number of categories. Nevertheless, a categorical variable should have at least three levels to be recoded. A categorical variable with two levels such as gender neednt to be recoded. In this study there were two categorical data; gender and type of chocolate. As it mentioned before, gender neednt to be recoded. The other categorical variable, type of chocolate, should be recoded. Milk chocolate will be selected as reference variable; and, two other variables will be coded as milkvsberry and milkvspeanut. Likewise all other multivariate statistical methods, Multiple Regression has various assumptions; and, all these assumptions should be checked before conducting the analysis. First assumption of multiple regression is normality. Unlike other multivariate analysis, regression analysis checks whether the error distributes normally or not. Secondly, multicollinearity, which is high level of intercorrelation among predictor variables, should be checked. Thirdly, assumption of homoscedasticity should be checked. Homoscedasticity assumes that the variance of the error term is constant across each value of the predictor. This means that there should not be seen a pattern on scatter plot. Fourth assumption is independence, that the error term is independent of the predictors in the model and of the values of the error term for other cases. The fifth assumption of multiple regression is linearity. Lastly, outliers should be check whether they affect the results or not. Partial plots, leverage statistics, Cooks D, DFBeta and Mahalonobis distance could be used to determine outliers. Results Descriptive Statistics Table 2 shows the descriptive statistics of the study. Table 2 shows that there is no missing data; mean of dependent variable, BMI, is 24.65 and the standard deviation is 4.48. Table 2 Descriptive Statistics Mean Std. Deviation N body mass index 24.65 4.48 600 Gender 1.54 .50 600 physical activity in a week 2.62 .74 600 milk chocolate vs berry chocolate .25 .44 600 milk chocolate vs peanut chocolate .27 .45 600 frequency of chocolate consumption 4.66 .73 600 fat rate (%) in chocolate 51.70 9.69 600 cacao rate (%) in chocolate 51.95 9.96 600 Table 3 shows the correlations between the variables. If the table is examine it is seen that the best predictor of BMI is fat rate in chocolate. There is a positive and high correlation between the BMI and fat rate in chocolate. On the other hand, there is no correlation between BMI and gender, physical activity in a week, milk chocolate vs berry chocolate. Moreover, there is no correlation higher than .90 between the independent variables. Table 3 Correlation Matrix 1 2 3 4 5 6 7 8 Pearson Correlation body mass index (1) 1.00 Gender (2) -.03 1.00 physical activity in a week (3) .04 -.13 1.00 milk chocolate vs berry chocolate (4) -.03 .03 -.11 1.00 milk chocolate vs peanut chocolate (5) .23 -.02 .12 -.36 1.00 frequency of chocolate (6) consumption .31 .12 .15 -.05 .19 1.00 fat rate (%) in chocolate (7) .64 -.12 .08 .02 .21 .30 1.00 cacao rate (%) in chocolate (8) .52 .08 .03 -.04 .22 .28 .51 1.00 Assumptions The first assumption of multiple regression to be checked is normality. Unlike other analysis, normality of residuals is checked whether errors normally distributed or not. Normality of residuals could be checked via two different ways; histogram and P-P plot. Figure 1 shows the histogram of regression standardized residuals. The histogram shows that there is a normal distribution of residuals. The frequency distribution of residuals is close to normal distribution line. Moreover, figure 2 shows the P-P plot of regression standardized residuals and it shows that distribution of errors is normal. It can be said that first assumption of multiple regression, normality, is not violated. Figure 1 Histogram of Regression Standardized Residual Figure 2 P-P Plot of Regression Standardized Residual The second assumption of multiple regression to be checked is multicollinearity. Multicollinearity could be checked with correlation matrix, VIF or tolerance values. There should not be any correlation that is higher than .90 between two independent variables. When the correlation matrix (Table 3) is examined there is no correlation higher than .90 between two independent variables. Table 4 shows the collinearity statistics of all three models. VIF values more than four or tolerance values higher than .20 are indicators of multicollinearity. Table 4 shows that there is no VIF value higher than four or tolerance value higher than .20. So, assumption of multicollinearity is not violated. Table 4 Collinearity Statistics Model Collinearity Statistics Tolerance VIF 1 (Constant) Gender .98 1.02 physical activity in a week .98 1.02 2 (Constant) Gender .96 1.04 physical activity in a week .94 1.06 milk chocolate vs berry chocolate .87 1.15 milk chocolate vs peanut chocolate .84 1.19 frequency of chocolate consumption .93 1.08 3 (Constant) Gender .92 1.08 physical activity in a week .94 1.06 milk chocolate vs berry chocolate .86 1.17 milk chocolate vs peanut chocolate .80 1.24 frequency of chocolate consumption .84 1.19 fat rate (%) in chocolate .67 1.49 cacao rate (%) in chocolate .70 1.43 The third assumption of multiple regression to be checked is homoscedasticity. Scatter plot of predicted value and residual is used to control homoscedasticity. Any pattern should not be seen on the scatter plot. Figure 4 shows that there is no pattern on the scatter plot; so, there is not homoscedasticity. Figure 4 Scatter plot of predicted value and residual The fourth assumption of multiple regression to be checked is independence. Independence is affected by the order of the independent variables and can be ignored if the order of independent variables is not important. Order of the independent variables is important in this study; so, independence should be checked in this study. Independence is checked with Durbin-Watson value that should be between 1.5 and 2.5. Durbin-Watson value of the model is 1.88; so, independence assumption is not violated. The last assumption of multiple regression is linearity. We assume that linearity is not violated in this study. Influential Observations Data should be checked whether there are outliers or not. Outliers could cause misleading results. There are different ways of checking outliers in multiple regression such as Partial plots, leverage statistics, Cooks D, DFBeta and Mahalonobis distance. Each method uses a different calculation method; so, multiple methods should be used and then make a decision whether a data is outlier or not. At first, partial plots of the dependent variable with each of the independent variable is examined (see on figure 5,6,7,8 and 9). Some cases that could be outliers are seen on each partial plot; but, this should not be forgotten, making decision over partial plots is a subjective way and other ways of controlling outliers should be used. A decision could be made even after all methods were conducted. Figure 5 Partial Plot of BMI and physical activity in a week Figure 6 Partial Plot of BMI and milk chocolate vs peanut chocolate Figure 7 Partial Plot of BMI and frequency of chocolate consumption Figure 8 Partial Plot of BMI and fat rate in chocolate Figure 9 Partial Plot of BMI and cacao rate in chocolate After controlling partial plots, leverage value could be controlled to identify the outliers. It is seen that there is no case, leverage value of which is higher than .50. According to leverage test results there is no outlier. Table 5 Extreme Values of Leverage Test Case Number Value Centered Leverage Value Highest 1 448 .04 2 384 .04 3 141 .03 4 324 .03 5 592 .03 Lowest 1 196 .00 2 103 .00 3 535 .05 4 160 .05 5 8 .05 After controlling leverage values, Cooks distance could be controlled. In Cooks Distance, a value greater than the value, calculated with the formula of mean + 2 * standard deviation, can be admitted as outlier. In this study critical value is .008 (.002+2*(.003)). Maximum value of Cooks distance is .03; so, it is expected that there will be outliers. Boxplot of Cooks distance (figure 10) shows that the cases 499, 438, 449, 236, 284, 484, 37, 354, 137, 97, 324 and 165 could be outliers. On the other hand, according to Cook and Weisberg (1982) values greater than 1 could be admitted as outlier. So, it can be assumed that there is no outlier. Figure 10 Boxplot of Cooks distance After controlling Cooks Distance, DF Beta values of each independent variable could be checked. DF Beta value shows the change in regression coefficient due to deletion of that row with outlier. According to Field (2009) a case can be outlier if absolute value of DF Beta is higher than one. According to Stevens (2002) a case can be outlier if absolute value of DF Beta is higher than two. In this study there is no case that has DF Beta value higher than one (see figure 11). According to DF Beta test values there is no outlier in this study. Figure 11 Boxplots of DF Beta values of Independent Variables Lastly, Mahalanobis Distance could be controlled to identify the outliers. If there is any case that is greater than the value of chi square at ÃŽà ±=.001 that could be admitted as outlier. The critical value at ÃŽà ±=.001 with seven predictors is 24.32. Table 6 shows the extreme values for this study and there is no value greater than 24.32. According to Mahalanobis distance test there is no outlier. Table 6 Extreme Values of Mahalanobis Distance Case Number Value Mahalanobis Distance Highest 1 448 23.72 2 384 20.90 3 141 20.50 4 324 19.15 5 592 17.99 Lowest 1 196 2.62 2 103 2.62 3 535 2.78 4 160 2.78 5 8 2.78 If the results of each test is summarized; Partial plots shows that there could be outliers, Leverage values show that there is no outliers, Cooks distance values show that there is no outlier, DF Beta values show that there is no outlier. According to results of the tests, it could be assumed that there is no outlier. Regression Results A hierarchical regression analysis was conducted to identify the predictors of BMI. Three different models were examined to understand which predictor explains has how much variance. Table 7 shows the summary of three models. Among three models, the first model is not statistically significant; the second and third models are significant. In the first model; gender and physical activity in a week were the predictors. This model explains the .2% of total variance, but insignificant; F (2, 597) = .67; p > .05. In the second model, milk chocolate vs berry chocolate, milk chocolate vs peanut chocolate and frequency of chocolate consumption are the predictors after controlling for the effect of gender and physical activity in a week. This model explains 13% of total variance explained significantly, F (3, 594) = 28.901; p In the third model, cacao rate (%) in chocolate, fat rate (%) in chocolate are the predictors of BMI after controlling for the effect of gender, physical activity in a week, milk chocolate vs berry chocolate, milk chocolate vs peanut chocolate and frequency of chocolate consumption. This model explains 34% of total variance explained significantly, F (2, 592) = 189.154, p Table 7 Regression Analysis Model Summary Model R R2 Change Statistics Durbin-Watson ÃŽâ⬠R2 ÃŽâ⬠F df1 df2 ÃŽâ⬠Sig. F 1 .05a .00 .00 .69 2 597 .50 2 .36b .13 .13 28.90 3 594 .00 3 .69c .47 .34 189.15 2 592 .00 1.879 a. Predictors: (Constant), physical activity in a week, gender b. Predictors: (Constant), physical activity in a week, gender, milk chocolate vs berry chocolate, frequency of chocolate consumption, milk chocolate vs peanut chocolate c. Predictors: (Constant), physical activity in a week, gender, milk chocolate vs berry chocolate, frequency of chocolate consumption, milk chocolate vs peanut chocolate, cacao rate (%) in chocolate, fat rate (%) in chocolate d. Dependent Variable: body mass index Table 8 shows the Coefficients of Hierarchical Regression Analysis that shows the significance and total variance explained by each predictor. In the first model any of the predictors significantly predicts the dependent variable, BMI. It can be said that neither the model, nor the predictors are statistically significant and do not predict the outcome variable, F (2, 597) = .67; p > .05. In the second model, overall model is significant, F (3, 594) = 28.901; p In the third model, overall model is significant, F (2, 592) = 189.154, p Table 8 Coefficients of Hierarchical Regression Analysis Model Unstandardized Coefficients Standardized Coefficients t p Correlations B Std. Error Beta Part 1 (Constant) 24.419 .941 25.938 .000 Gender -.232 .370 -.026 -.628 .530 -.026 physical activity in a week .226 .251 .037 .900 .369 .037 2 (Constant) 17.165 1.309 13.110 .000 milk chocolate vs berry chocolate .539 .423 .052 1.273 .204 .049 milk chocolate vs peanut chocolate 1.943 .420 .193 4.629 .000 .177 frequency of chocolate consumption 1.751 .245 .283 7.135 .000 .273 3 (Constant) 5.426 1.191 4.557 .000 fat rate (%) in chocolate .221 .017 .477 13.033 .000 .390 cacao rate (%) in chocolate .109 .016 .242 6.766 .000 .203 a. Dependent Variable: body mass index Discussion Two different research questions were tried to be answered in this study. First research question was How well the type of chocolate and frequency of chocolate consumption predict body mass index, after controlling for gender physical activity?. Second research question was How well do fat percentage and cacao percentage in chocolate explain body mass index, after controlling the results of the first research question?. A hierarchical regression analysis was conducted to answer the research questions. Three models were examined to find the predictors and their contribution to these models. The first model that examines that how well gender and physical activity in a week predict the dependent variable. Result of the first model shows that neither model nor predictors significantly predict the BMI. The second model examined to answer the first research question. This model predicts 13% of total variance explained. Milk chocolate vs berry chocolate does not significantly explain the BMI. Milk chocolate vs peanut chocolate explains 3%, frequency of chocolate consumption explains 7% of total variance explained. The third model examined to answer the second research question. This model predicts 47% of total variance explained and 34% of total variance explained uniquely. Fat rate in chocolate explains 15% and cacao rate in chocolate explains 4% of total variance uniquely. When all models were examined it is seen that fat rate in chocolate is the best predictor of BMI by explaining 15% of total variance explained. Frequency of chocolate consumption is the second by explaining 7% of total variance explained. Cacao rate is the third predictor by explaining 4% of total variance explained.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.