Quantitative Assignment, MAE306 Essay Example

QUАNTITАTIVЕ АSSIGNMЕNT, МАЕ306

Trimester 2 2013

    1. Descriptive Statistics

      1. Weight of Newborn Babies

The mean of weight of newborn babies (bwght) is 118.6996 and the standard deviation is 20.3540. The median weight of newborn babies is 120. The maximum weight of newborn babies is 271 while the minimum is 23. The distribution for the data in this variable is relatively normal; the skewness value of about -0.1459 is not significant. The variable has 1388 observations.

The mode for the variable male (if the child is male) is 1, meaning most of the children are male. The data is normally distributed; the skewness value of about -0.0836 is not significant. The variable has 1388 observations.

The median birth order of the children (parity) is 1. The mode (most prevalent) birth order of the children is also 1. The maximum birth order is 6. The variable has 1388 observations.

      1. Family Income

The average family income (faminc) is about $29,026.66 with a standard deviation of $18,739.28. However, the median family income is about $27,500.00. The maximum family income is approximately $65,000 while the minimum family income is approximately $500. The data for the variable family income has a relatively normal distribution even with the skewness value of about 0.6176. The variable has 1388 observations.

      1. Average Number of Packs of Cigarettes Smoked per Day

The mean average number of packs of cigarettes smoked per day during pregnancy (packs) is about 0.1044 packs with a standard deviation of 298.63. The maximum number of packs smoked per day by a pregnant mother is 2.5 packs while the minimum is zero, which means some of the respondents did not smoke at all during pregnancy. The data for the variable packs is positively skewed with skewness value of about 3.5604. The variable has 1388 observations.

Descriptive Statistics

 29.02666

 118.6996

 0.104359

 Median

 27.50000

 120.0000

 1.000000

 0.000000

 Maximum

 65.00000

 271.0000

 6.000000

 2.500000

 Minimum

 0.500000

 23.00000

 1.000000

 0.000000

 Std. Dev.

 18.73928

 20.35396

 0.298634

 Skewness

 0.617620

-0.145866

 1.629925

-0.083647

 3.560448

 Observations

    1. Model Estimation

The following model estimate the effects of family income (faminc), birth order of this child (parity), male (if the child is male), and the average number of packs of cigarettes smoked per day during pregnancy (packs) on the weight of newborn babies (bwght):

log(bwght)=4.6756+0.02624male+0.0147parity+18.050log(faminc)-0.0837packs ± 0.0603

Each coefficient including the constant is statistically significant (P<0.01). The p-values are as follows: 4.6756 (β0), P = 0; 0.02624 (β1), P =0.0094; 0.0147(β2), P = 0.0094; 18.050 (β3), P = 0.0013; 0.0837(β4),
P = 0. The model is also statistically significant (P<0.01; F-statistic = 12.5544). Only about 3.50% of the variation in the weight of the newborn babies (bwght) is accounted for by the set of the independent variables.

The coefficients have varying implications. Apparently, a unit change in male would result in 2.624% change in the weight of newborn babies. However, because male is a binary indicator it means that a non-male child is likely to weigh less than a male child is by 2.624%.

A unit change in the parity order causes a 1.47% change of the weight of newborn babies. For instance, an increase in parity by one unit would results in a increase in the weight of newborn child by 1.47%.

One percent (1%) change in the family income results in an 18.05% change in the weight of newborn babies. Since the relationship between two variables is positive, an increase in family income will cause an increase in the weight of newborn babies and vice versa.

One unit change in packs causes 8.37% change in the weight of the newborn baby. A unit decrease in the number of packs a pregnant mother smokes per day would likely cause the weight of the newborn babies to increase by 8.37%. A unit increase in the number of packs a pregnant mother smokes per day would likely results in a decrease in the weight of the newborn babies by 8.37%.

  1. Effects of Mother and Father’s Years of Education

Inclusion of motheduc (years of schooling of the child’s mother) in the regression model does not cause any significant change in log(bwght) as well as the constant (4.6756±0.0219) and the coefficients of male (0.0262±0.0101), parity(0.0147±0.0057), log(faminc) (0.0181±0.0056), and packs(-0.0837±0.0171). Because, the p-value (0.7306) of motheduc is greater than the alpha level of 0.05, the null hypothesis that β5 = 0 (motheduc has no linear correlation with the log of bwght) is not rejected.

Inclusion of fatheduc (father’s year of schooling) in the model along with motheduc would cause the correlation between log(bwght) and log(faminc) to become non-significant (P>0.05). In spite of that, the inclusion of the variable would increase the percentage of the variation in log(bwght) accounted by a set of the six independent variables.

First, a visual inspection of the scatter plot of the residue versus the fitted values (figure 1 in the appendix) suggests existence of heteroskedasticity in the model; the envelop of residuals is somewhat uneven along the x-axis (bwghtf). However, the White’s heteroskedasticity test with the null hypothesis of no heteroskedasticity implies that heteroskedasticity does not exist in the model. The null hypothesis is not rejected because the p-value (0.9570) is greater than the alpha level (0.05). Second, the results of the Breusch-Pagan-Godfrey test — null hypothesis that the condition variance is constant – (P>0.05) also suggest that there is no heteroskedasticity.

The sample has no autocorrelation (Durbin-Watson statistic =1.931302); a Durbin-Watson statistic of 2.0 implies non-existence of autocorrelation.

All the variables (faminc, parity, male, and packs) have a non-linear effect on the birth weight of newborn babies. First, the scatter plot of each variable against bwght illustrates a non-linear relationship. Additionally, the correlation coefficients (r) representing each of the relationship between bwght and each variable are close to zero, implying non-existence of linear relationship.

Quantitative Assignment, MAE306Quantitative Assignment, MAE306 1

Quantitative Assignment, MAE306 2Quantitative Assignment, MAE306 3

Figure 4.1

Correlation Matrix

LOG(BWGHT)

LOG(FAMINC)

LOG(BWGHT)

 1.000000

 0.064068

 0.051516

 0.099241

-0.140674

By inspecting the graph of the residual against packs, one would expect that packs is correlated with u(error term). The plot implies a pattern (the plots are not random) suggesting that the variable pack may be correlated with the error term.

Quantitative Assignment, MAE306 4

Figure 5.1

The average cigarette price in each woman’s state of residence (cigprice) is likely to satisfy the properties of a good instrumental variable for packs. A visual inspection of a scatter plot of packs versus cigprice (figure 6.1a) as well as the correlation coefficient (r= 0.0097) suggest existence of a weak correlation between packs and cigprice. Therefore, the requirement of correlation between the stochastic variable and the candidate instrument is met. Additionally, it seems there no correlation between cigprice and the error term as shown by the scatterplot of cigprice and the residues (figure 6.1b), which is also a satisfaction of the second property of a good instrumental variable. in general, therefore, cigprice is a good instrumental variable for packs.

Quantitative Assignment, MAE306 5Quantitative Assignment, MAE306 6

Figure 6.1

Motheduc, nonetheless, is not a good instrumental variable for packs because it has very weak, if any, correlation with the stochastic variable packs, and it is likely that motheduc is correlated (although weak) with the error term as illustrated by the scatter plots in figure 6.2.

Quantitative Assignment, MAE306 7Quantitative Assignment, MAE306 8

  1. Estimation using 2SLs, where cigprice is an instrumental variable for packs:

log(bwght) = 4.1792 + 0.0884male + 0.1466parity + 72.4780log(faminc) + 0.6976packs ± 3.217865

A number of important differences in OLS and 2SLs estimates in equation (1) are evident. First, the effects of the coefficients on the bwght (weight of the newborn babies) changes significantly. The values of the constant and the coefficient for male decreases under 2SLs but the values of the coefficients for parity, log(faminc), packs and the value for error term increases packs. The coefficients of male, parity, log(faminc), and packs that were significant under OLS (P<0.05) becomes insignificant (P>0.05) when 2SLs is applied. In addition, although the model is statistically significant (P<0.05) when OLS is used, it becomes insignificant (P>0.05) even as the percentage of the changes in bwght accounted for by the regressors increases when 2SLs is applied.

  1. The results of the Hausman test test shows the existence of endogeneity (P>0.05).

  1. The first-stage regression for packs:

log(bwght) = 4.1792 + 0.0884male + 0.1466parity + 72.4780log(faminc) + 0.6976packs ± 3.217865.

The instrument cigprice is weak; The instrument is insignificant (P>0.05).

  1. Estimation of the reduced form for packs: packs = 0.200233 — 0.004178 male + 0.018063 parity — 0.052142 log(faminc) + 0.000284cigprice(-1). The cigprice is not significant in the model (P>0.05). Therefore, cigprice is not a good instrument for packs and should not be used to in identify equation(1) as an instrument of packs. It means the answer from question 7 above is not valid.

Appendix

 29.02666

 118.6996

 1.632565

 0.520893

 0.104359

 Median

 27.50000

 120.0000

 1.000000

 1.000000

 0.000000

 Maximum

 65.00000

 271.0000

 6.000000

 1.000000

 2.500000

 Minimum

 0.500000

 23.00000

 1.000000

 0.000000

 0.000000

 Std. Dev.

 18.73928

 20.35396

 0.894027

 0.499743

 0.298634

 Skewness

 0.617620

-0.145866

 1.629925

-0.083647

 3.560448

 Kurtosis

 2.473396

 6.147639

 5.933811

 1.006997

 17.93397

 Jarque-Bera

 104.2811

 577.9134

 1112.359

 231.3362

 15830.76

 Probability

 0.000000

 0.000000

 0.000000

 0.000000

 0.000000

 40289.00

 164755.0

 2266.000

 723.0000

 144.8500

 Sum Sq. Dev.

 487060.0

 574611.7

 1108.608

 346.3941

 123.6961

 Observations

Dependent Variable: LOG(BWGHT)

Method: Least Squares

Date: 09/29/13 Time: 16:36

Sample: 1 1388

Included observations: 1388

Variable

Coefficient

Std. Error

t-Statistic

Prob.  

4.675618

0.021881

213.6812

0.026241

0.010089

2.600832

0.014729

0.005665

2.600231

LOG(FAMINC)

0.018050

0.005584

3.232601

-0.083728

0.017121

-4.890393

R-squared

0.035038

    Mean dependent var

4.760031

Adjusted R-squared

0.032247

    S.D. dependent var

0.190662

S.E. of regression

0.187563

    Akaike info criterion

-0.505810

Sum squared resid

48.65368

    Schwarz criterion

-0.486950

Log likelihood

356.0321

    Hannan-Quinn criter.

-0.498757

F-statistic

12.55439

    Durbin-Watson stat

1.931302

Prob(F-statistic)

0.000000

Dependent Variable: LOG(BWGHT)

Method: Least Squares

Date: 09/29/13 Time: 17:14

Sample: 1 1388

Included observations: 1387

Variable

Coefficient

Std. Error

t-Statistic

Prob.  

0.279908

0.039412

7.102068

0.324386

0.020833

15.57089

LOG(FAMINC)

0.216119

0.023411

9.231443

0.511012

0.066764

7.653960

MOTHEDUC

0.251747

0.006290

40.02084

R-squared

-14.168290

    Mean dependent var

4.760094

Adjusted R-squared

-14.212193

    S.D. dependent var

0.190717

S.E. of regression

0.743848

    Akaike info criterion

2.249639

Sum squared resid

764.6747

    Schwarz criterion

2.268510

Log likelihood

-1555.125

    Hannan-Quinn criter.

2.256697

Durbin-Watson stat

1.900895

Dependent Variable: LOG(BWGHT)

Method: Least Squares

Date: 09/29/13 Time: 17:50

Sample: 1 1388

Included observations: 1191

Variable

Coefficient

Std. Error

t-Statistic

Prob.  

4.675889

0.037493

124.7129

0.033601

0.010737

3.129422

0.016445

0.006149

2.674333

LOG(FAMINC)

0.016037

0.008405

1.907997

-0.101457

0.020583

-4.929124

MOTHEDUC

-0.003389

0.002980

-1.137328

FATHEDUC

0.003683

0.002614

1.409005

R-squared

0.042026

    Mean dependent var

4.767536

Adjusted R-squared

0.037172

    S.D. dependent var

0.188013

S.E. of regression

0.184485

    Akaike info criterion

-0.536634

Sum squared resid

40.29723

    Schwarz criterion

-0.506762

Log likelihood

326.5655

    Hannan-Quinn criter.

-0.525377

F-statistic

8.656993

    Durbin-Watson stat

1.976307

Prob(F-statistic)

0.000000

Quantitative Assignment, MAE306 9

Figure 1. Scatter plot of residue versus the fitted values (bwghtf)

Heteroskedasticity Test: White

F-statistic

0.434372

    Prob. F(13,1374)

Obs*R-squared

5.681028

    Prob. Chi-Square(13)

Scaled explained SS

32.57738

    Prob. Chi-Square(13)

Heteroskedasticity Test: Breusch-Pagan-Godfrey

F-statistic

0.282681

    Prob. F(4,1383)

Obs*R-squared

1.133884

    Prob. Chi-Square(4)

Scaled explained SS

6.502162

    Prob. Chi-Square(4)

LOG(BWGHT)

LOG(FAMINC)

LOG(BWGHT)

 1.000000

 0.064068

 0.051516

 0.099241

-0.140674

 0.064068

 1.000000

-0.013465

-0.044251

-0.000490

 0.051516

-0.013465

 1.000000

-0.088097

 0.068383

LOG(FAMINC)

 0.099241

-0.044251

-0.088097

 1.000000

-0.163616

-0.140674

-0.000490

 0.068383

-0.163616

 1.000000

LOG(BWGHT)

LOG(FAMINC)

LOG(BWGHT)

 1.000000

 0.064068

 0.051516

 0.099241

-0.140674

 0.064068

 1.000000

-0.013465

-0.044251

-0.000490

 0.051516

-0.013465

 1.000000

-0.088097

 0.068383

LOG(FAMINC)

 0.099241

-0.044251

-0.088097

 1.000000

-0.163616

-0.140674

-0.000490

 0.068383

-0.163616

 1.000000

Quantitative Assignment, MAE306 10

Figure 5.1

Quantitative Assignment, MAE306 11Quantitative Assignment, MAE306 12

Figure 6.1

Quantitative Assignment, MAE306 13Quantitative Assignment, MAE306 14

Dependent Variable: LOG(BWGHT)

Method: Two-Stage Least Squares

Date: 09/30/13 Time: 07:45

Sample (adjusted): 2 1388

Included observations: 1387 after adjustments

Instrument list: MALE(-1) PARITY(-1) LOG(FAMINC)(-1) CIGPRICE

Variable

Coefficient

Std. Error

t-Statistic

Prob.  

4.179208

0.586532

7.125284

0.088403

0.446658

0.197922

0.146587

0.243883

0.601054

LOG(FAMINC)

0.072478

0.094981

0.763079

0.697584

1.845811

0.377928

R-squared

-1.903497

    Mean dependent var

4.760081

Adjusted R-squared

-1.911901

    S.D. dependent var

0.190722

S.E. of regression

0.325454

    Sum squared resid

146.3816

F-statistic

1.463930

    Durbin-Watson stat

1.906329

Prob(F-statistic)

0.210824

    Second-Stage SSR

49.79537

Estimation Command:

=========================

TSLS LOG(BWGHT) C MALE PARITY LOG(FAMINC) PACKS @ MALE(-1) PARITY(-1) LOG(FAMINC)(-1) CIGPRICE

Estimation Equation:

=========================

LOG(BWGHT) = C(1) + C(2)*MALE + C(3)*PARITY + C(4)*LOG(FAMINC) + C(5)*PACKS

Substituted Coefficients:

=========================

LOG(BWGHT) = 4.17920804871 + 0.0884032150072*MALE + 0.146586901071*PARITY + 0.072477803545*LOG(FAMINC) + 0.697583557787*PACKS

Dependent Variable: PACKS

Method: Least Squares

Date: 09/30/13 Time: 09:45

Sample: 1 1388

Included observations: 1388

Variable

Coefficient

Std. Error

t-Statistic

Prob.  

0.137408

0.104001

1.321219

-0.004726

0.015854

-0.298105

0.018149

0.008880

2.043784

LOG(FAMINC)

-0.052637

0.008699

-6.050876

CIGPRICE

0.000777

0.000776

1.000900

R-squared

0.030454

    Mean dependent var

0.104359

Adjusted R-squared

0.027650

    S.D. dependent var

0.298634

S.E. of regression

0.294477

    Akaike info criterion

0.396363

Sum squared resid

119.9291

    Schwarz criterion

0.415223

Log likelihood

-270.0760

    Hannan-Quinn criter.

0.403417

F-statistic

10.86023

    Durbin-Watson stat

1.944888

Prob(F-statistic)

0.000000

Dependent Variable: PACKS

Method: Least Squares

Date: 09/30/13 Time: 09:51

Sample (adjusted): 2 1388

Included observations: 1387 after adjustments

Variable

Coefficient

Std. Error

t-Statistic

Prob.  

0.200233

0.104029

1.924777

-0.004178

0.015872

-0.263210

0.018063

0.008887

2.032382

LOG(FAMINC)

-0.052142

0.008708

-5.987568

CIGPRICE(-1)

0.000284

0.000778

0.364834

R-squared

0.029867

    Mean dependent var

0.104434

Adjusted R-squared

0.027059

    S.D. dependent var

0.298729

S.E. of regression

0.294660

    Akaike info criterion

0.397606

Sum squared resid

119.9911

    Schwarz criterion

0.416477

Log likelihood

-270.7398

    Hannan-Quinn criter.

0.404664

F-statistic

10.63681

    Durbin-Watson stat

1.945395

Prob(F-statistic)

0.000000