# Quantitative Assignment, MAE306 Essay Example

QUАNTITАTIVЕ АSSIGNMЕNT, МАЕ306

Trimester 2 2013

1. Descriptive Statistics

1. Weight of Newborn Babies

The mean of weight of newborn babies (bwght) is 118.6996 and the standard deviation is 20.3540. The median weight of newborn babies is 120. The maximum weight of newborn babies is 271 while the minimum is 23. The distribution for the data in this variable is relatively normal; the skewness value of about -0.1459 is not significant. The variable has 1388 observations.

The mode for the variable male (if the child is male) is 1, meaning most of the children are male. The data is normally distributed; the skewness value of about -0.0836 is not significant. The variable has 1388 observations.

The median birth order of the children (parity) is 1. The mode (most prevalent) birth order of the children is also 1. The maximum birth order is 6. The variable has 1388 observations.

1. Family Income

The average family income (faminc) is about \$29,026.66 with a standard deviation of \$18,739.28. However, the median family income is about \$27,500.00. The maximum family income is approximately \$65,000 while the minimum family income is approximately \$500. The data for the variable family income has a relatively normal distribution even with the skewness value of about 0.6176. The variable has 1388 observations.

1. Average Number of Packs of Cigarettes Smoked per Day

The mean average number of packs of cigarettes smoked per day during pregnancy (packs) is about 0.1044 packs with a standard deviation of 298.63. The maximum number of packs smoked per day by a pregnant mother is 2.5 packs while the minimum is zero, which means some of the respondents did not smoke at all during pregnancy. The data for the variable packs is positively skewed with skewness value of about 3.5604. The variable has 1388 observations.

 Descriptive Statistics 29.02666 118.6996 0.104359 Median 27.50000 120.0000 1.000000 0.000000 Maximum 65.00000 271.0000 6.000000 2.500000 Minimum 0.500000 23.00000 1.000000 0.000000 Std. Dev. 18.73928 20.35396 0.298634 Skewness 0.617620 -0.145866 1.629925 -0.083647 3.560448 Observations
1. Model Estimation

The following model estimate the effects of family income (faminc), birth order of this child (parity), male (if the child is male), and the average number of packs of cigarettes smoked per day during pregnancy (packs) on the weight of newborn babies (bwght):

log(bwght)=4.6756+0.02624male+0.0147parity+18.050log(faminc)-0.0837packs ± 0.0603

Each coefficient including the constant is statistically significant (P<0.01). The p-values are as follows: 4.6756 (β0), P = 0; 0.02624 (β1), P =0.0094; 0.0147(β2), P = 0.0094; 18.050 (β3), P = 0.0013; 0.0837(β4),
P = 0. The model is also statistically significant (P<0.01; F-statistic = 12.5544). Only about 3.50% of the variation in the weight of the newborn babies (bwght) is accounted for by the set of the independent variables.

The coefficients have varying implications. Apparently, a unit change in male would result in 2.624% change in the weight of newborn babies. However, because male is a binary indicator it means that a non-male child is likely to weigh less than a male child is by 2.624%.

A unit change in the parity order causes a 1.47% change of the weight of newborn babies. For instance, an increase in parity by one unit would results in a increase in the weight of newborn child by 1.47%.

One percent (1%) change in the family income results in an 18.05% change in the weight of newborn babies. Since the relationship between two variables is positive, an increase in family income will cause an increase in the weight of newborn babies and vice versa.

One unit change in packs causes 8.37% change in the weight of the newborn baby. A unit decrease in the number of packs a pregnant mother smokes per day would likely cause the weight of the newborn babies to increase by 8.37%. A unit increase in the number of packs a pregnant mother smokes per day would likely results in a decrease in the weight of the newborn babies by 8.37%.

1. Effects of Mother and Father’s Years of Education

Inclusion of motheduc (years of schooling of the child’s mother) in the regression model does not cause any significant change in log(bwght) as well as the constant (4.6756±0.0219) and the coefficients of male (0.0262±0.0101), parity(0.0147±0.0057), log(faminc) (0.0181±0.0056), and packs(-0.0837±0.0171). Because, the p-value (0.7306) of motheduc is greater than the alpha level of 0.05, the null hypothesis that β5 = 0 (motheduc has no linear correlation with the log of bwght) is not rejected.

Inclusion of fatheduc (father’s year of schooling) in the model along with motheduc would cause the correlation between log(bwght) and log(faminc) to become non-significant (P>0.05). In spite of that, the inclusion of the variable would increase the percentage of the variation in log(bwght) accounted by a set of the six independent variables.

First, a visual inspection of the scatter plot of the residue versus the fitted values (figure 1 in the appendix) suggests existence of heteroskedasticity in the model; the envelop of residuals is somewhat uneven along the x-axis (bwghtf). However, the White’s heteroskedasticity test with the null hypothesis of no heteroskedasticity implies that heteroskedasticity does not exist in the model. The null hypothesis is not rejected because the p-value (0.9570) is greater than the alpha level (0.05). Second, the results of the Breusch-Pagan-Godfrey test — null hypothesis that the condition variance is constant – (P>0.05) also suggest that there is no heteroskedasticity.

The sample has no autocorrelation (Durbin-Watson statistic =1.931302); a Durbin-Watson statistic of 2.0 implies non-existence of autocorrelation.

All the variables (faminc, parity, male, and packs) have a non-linear effect on the birth weight of newborn babies. First, the scatter plot of each variable against bwght illustrates a non-linear relationship. Additionally, the correlation coefficients (r) representing each of the relationship between bwght and each variable are close to zero, implying non-existence of linear relationship.

Figure 4.1

 Correlation Matrix LOG(BWGHT) LOG(FAMINC) LOG(BWGHT) 1.000000 0.064068 0.051516 0.099241 -0.140674

By inspecting the graph of the residual against packs, one would expect that packs is correlated with u(error term). The plot implies a pattern (the plots are not random) suggesting that the variable pack may be correlated with the error term.

Figure 5.1

The average cigarette price in each woman’s state of residence (cigprice) is likely to satisfy the properties of a good instrumental variable for packs. A visual inspection of a scatter plot of packs versus cigprice (figure 6.1a) as well as the correlation coefficient (r= 0.0097) suggest existence of a weak correlation between packs and cigprice. Therefore, the requirement of correlation between the stochastic variable and the candidate instrument is met. Additionally, it seems there no correlation between cigprice and the error term as shown by the scatterplot of cigprice and the residues (figure 6.1b), which is also a satisfaction of the second property of a good instrumental variable. in general, therefore, cigprice is a good instrumental variable for packs.

Figure 6.1

Motheduc, nonetheless, is not a good instrumental variable for packs because it has very weak, if any, correlation with the stochastic variable packs, and it is likely that motheduc is correlated (although weak) with the error term as illustrated by the scatter plots in figure 6.2.

1. Estimation using 2SLs, where cigprice is an instrumental variable for packs:

log(bwght) = 4.1792 + 0.0884male + 0.1466parity + 72.4780log(faminc) + 0.6976packs ± 3.217865

A number of important differences in OLS and 2SLs estimates in equation (1) are evident. First, the effects of the coefficients on the bwght (weight of the newborn babies) changes significantly. The values of the constant and the coefficient for male decreases under 2SLs but the values of the coefficients for parity, log(faminc), packs and the value for error term increases packs. The coefficients of male, parity, log(faminc), and packs that were significant under OLS (P<0.05) becomes insignificant (P>0.05) when 2SLs is applied. In addition, although the model is statistically significant (P<0.05) when OLS is used, it becomes insignificant (P>0.05) even as the percentage of the changes in bwght accounted for by the regressors increases when 2SLs is applied.

1. The results of the Hausman test test shows the existence of endogeneity (P>0.05).

1. The first-stage regression for packs:

log(bwght) = 4.1792 + 0.0884male + 0.1466parity + 72.4780log(faminc) + 0.6976packs ± 3.217865.

The instrument cigprice is weak; The instrument is insignificant (P>0.05).

1. Estimation of the reduced form for packs: packs = 0.200233 — 0.004178 male + 0.018063 parity — 0.052142 log(faminc) + 0.000284cigprice(-1). The cigprice is not significant in the model (P>0.05). Therefore, cigprice is not a good instrument for packs and should not be used to in identify equation(1) as an instrument of packs. It means the answer from question 7 above is not valid.

Appendix

 29.02666 118.6996 1.632565 0.520893 0.104359 Median 27.50000 120.0000 1.000000 1.000000 0.000000 Maximum 65.00000 271.0000 6.000000 1.000000 2.500000 Minimum 0.500000 23.00000 1.000000 0.000000 0.000000 Std. Dev. 18.73928 20.35396 0.894027 0.499743 0.298634 Skewness 0.617620 -0.145866 1.629925 -0.083647 3.560448 Kurtosis 2.473396 6.147639 5.933811 1.006997 17.93397 Jarque-Bera 104.2811 577.9134 1112.359 231.3362 15830.76 Probability 0.000000 0.000000 0.000000 0.000000 0.000000 40289.00 164755.0 2266.000 723.0000 144.8500 Sum Sq. Dev. 487060.0 574611.7 1108.608 346.3941 123.6961 Observations
 Dependent Variable: LOG(BWGHT) Method: Least Squares Date: 09/29/13 Time: 16:36 Sample: 1 1388 Included observations: 1388 Variable Coefficient Std. Error t-Statistic Prob. 4.675618 0.021881 213.6812 0.026241 0.010089 2.600832 0.014729 0.005665 2.600231 LOG(FAMINC) 0.018050 0.005584 3.232601 -0.083728 0.017121 -4.890393 R-squared 0.035038 Mean dependent var 4.760031 Adjusted R-squared 0.032247 S.D. dependent var 0.190662 S.E. of regression 0.187563 Akaike info criterion -0.505810 Sum squared resid 48.65368 Schwarz criterion -0.486950 Log likelihood 356.0321 Hannan-Quinn criter. -0.498757 F-statistic 12.55439 Durbin-Watson stat 1.931302 Prob(F-statistic) 0.000000
 Dependent Variable: LOG(BWGHT) Method: Least Squares Date: 09/29/13 Time: 17:14 Sample: 1 1388 Included observations: 1387 Variable Coefficient Std. Error t-Statistic Prob. 0.279908 0.039412 7.102068 0.324386 0.020833 15.57089 LOG(FAMINC) 0.216119 0.023411 9.231443 0.511012 0.066764 7.653960 MOTHEDUC 0.251747 0.006290 40.02084 R-squared -14.168290 Mean dependent var 4.760094 Adjusted R-squared -14.212193 S.D. dependent var 0.190717 S.E. of regression 0.743848 Akaike info criterion 2.249639 Sum squared resid 764.6747 Schwarz criterion 2.268510 Log likelihood -1555.125 Hannan-Quinn criter. 2.256697 Durbin-Watson stat 1.900895
 Dependent Variable: LOG(BWGHT) Method: Least Squares Date: 09/29/13 Time: 17:50 Sample: 1 1388 Included observations: 1191 Variable Coefficient Std. Error t-Statistic Prob. 4.675889 0.037493 124.7129 0.033601 0.010737 3.129422 0.016445 0.006149 2.674333 LOG(FAMINC) 0.016037 0.008405 1.907997 -0.101457 0.020583 -4.929124 MOTHEDUC -0.003389 0.002980 -1.137328 FATHEDUC 0.003683 0.002614 1.409005 R-squared 0.042026 Mean dependent var 4.767536 Adjusted R-squared 0.037172 S.D. dependent var 0.188013 S.E. of regression 0.184485 Akaike info criterion -0.536634 Sum squared resid 40.29723 Schwarz criterion -0.506762 Log likelihood 326.5655 Hannan-Quinn criter. -0.525377 F-statistic 8.656993 Durbin-Watson stat 1.976307 Prob(F-statistic) 0.000000

Figure 1. Scatter plot of residue versus the fitted values (bwghtf)

 Heteroskedasticity Test: White F-statistic 0.434372 Prob. F(13,1374) Obs*R-squared 5.681028 Prob. Chi-Square(13) Scaled explained SS 32.57738 Prob. Chi-Square(13)
 Heteroskedasticity Test: Breusch-Pagan-Godfrey F-statistic 0.282681 Prob. F(4,1383) Obs*R-squared 1.133884 Prob. Chi-Square(4) Scaled explained SS 6.502162 Prob. Chi-Square(4)
 LOG(BWGHT) LOG(FAMINC) LOG(BWGHT) 1.000000 0.064068 0.051516 0.099241 -0.140674 0.064068 1.000000 -0.013465 -0.044251 -0.000490 0.051516 -0.013465 1.000000 -0.088097 0.068383 LOG(FAMINC) 0.099241 -0.044251 -0.088097 1.000000 -0.163616 -0.140674 -0.000490 0.068383 -0.163616 1.000000
 LOG(BWGHT) LOG(FAMINC) LOG(BWGHT) 1.000000 0.064068 0.051516 0.099241 -0.140674 0.064068 1.000000 -0.013465 -0.044251 -0.000490 0.051516 -0.013465 1.000000 -0.088097 0.068383 LOG(FAMINC) 0.099241 -0.044251 -0.088097 1.000000 -0.163616 -0.140674 -0.000490 0.068383 -0.163616 1.000000

Figure 5.1

Figure 6.1

 Dependent Variable: LOG(BWGHT) Method: Two-Stage Least Squares Date: 09/30/13 Time: 07:45 Sample (adjusted): 2 1388 Included observations: 1387 after adjustments Instrument list: MALE(-1) PARITY(-1) LOG(FAMINC)(-1) CIGPRICE Variable Coefficient Std. Error t-Statistic Prob. 4.179208 0.586532 7.125284 0.088403 0.446658 0.197922 0.146587 0.243883 0.601054 LOG(FAMINC) 0.072478 0.094981 0.763079 0.697584 1.845811 0.377928 R-squared -1.903497 Mean dependent var 4.760081 Adjusted R-squared -1.911901 S.D. dependent var 0.190722 S.E. of regression 0.325454 Sum squared resid 146.3816 F-statistic 1.463930 Durbin-Watson stat 1.906329 Prob(F-statistic) 0.210824 Second-Stage SSR 49.79537

Estimation Command:

=========================

TSLS LOG(BWGHT) C MALE PARITY LOG(FAMINC) PACKS @ MALE(-1) PARITY(-1) LOG(FAMINC)(-1) CIGPRICE

Estimation Equation:

=========================

LOG(BWGHT) = C(1) + C(2)*MALE + C(3)*PARITY + C(4)*LOG(FAMINC) + C(5)*PACKS

Substituted Coefficients:

=========================

LOG(BWGHT) = 4.17920804871 + 0.0884032150072*MALE + 0.146586901071*PARITY + 0.072477803545*LOG(FAMINC) + 0.697583557787*PACKS

 Dependent Variable: PACKS Method: Least Squares Date: 09/30/13 Time: 09:45 Sample: 1 1388 Included observations: 1388 Variable Coefficient Std. Error t-Statistic Prob. 0.137408 0.104001 1.321219 -0.004726 0.015854 -0.298105 0.018149 0.008880 2.043784 LOG(FAMINC) -0.052637 0.008699 -6.050876 CIGPRICE 0.000777 0.000776 1.000900 R-squared 0.030454 Mean dependent var 0.104359 Adjusted R-squared 0.027650 S.D. dependent var 0.298634 S.E. of regression 0.294477 Akaike info criterion 0.396363 Sum squared resid 119.9291 Schwarz criterion 0.415223 Log likelihood -270.0760 Hannan-Quinn criter. 0.403417 F-statistic 10.86023 Durbin-Watson stat 1.944888 Prob(F-statistic) 0.000000
 Dependent Variable: PACKS Method: Least Squares Date: 09/30/13 Time: 09:51 Sample (adjusted): 2 1388 Included observations: 1387 after adjustments Variable Coefficient Std. Error t-Statistic Prob. 0.200233 0.104029 1.924777 -0.004178 0.015872 -0.263210 0.018063 0.008887 2.032382 LOG(FAMINC) -0.052142 0.008708 -5.987568 CIGPRICE(-1) 0.000284 0.000778 0.364834 R-squared 0.029867 Mean dependent var 0.104434 Adjusted R-squared 0.027059 S.D. dependent var 0.298729 S.E. of regression 0.294660 Akaike info criterion 0.397606 Sum squared resid 119.9911 Schwarz criterion 0.416477 Log likelihood -270.7398 Hannan-Quinn criter. 0.404664 F-statistic 10.63681 Durbin-Watson stat 1.945395 Prob(F-statistic) 0.000000