Descriptive statistical methods Essay Example

• Category:
Statistics
• Document type:
Coursework
• Level:
Masters
• Page:
2
• Words:
1237

7STATISTICS

Statistics

1. Descriptive statistical methods

The descriptive statistical data obtained from a sample of 50 consumers are as shown in the table below.

Table 1: Descriptive statistical data

 Descriptives Income(\$1000s) Household size Amount charged(\$) Standard Error 2.057786 132.0234 Standard Deviation 14.55074 1.738989 933.5463 Sample Variance 211.7241 3.024082 871508.7 Kurtosis -1.24772 -0.72281 -0.74248 Skewness 0.095856 0.527896 -0.12886

From the table above, the mean income of the consumers is (\$43,480; SE=2.05) and their mean household size is (3.42; SE=0.24) while the mean charges by credit card users stood at (\$3963.86; SE=132.02). The mode for household size (2) and amount charged (\$4090) were higher than the mode for income (\$42,000). Similarly, the standard deviation or deviation from the mean for income was \$14,550 as household size stood at 1.73 while deviation from the mean for amount charged was \$933.54. These values were obtained by taking the square root of sample variance. Kurtosis values are negative indicating that flat peaks and less heavily tailed closer to the mean. Meanwhile, the skewness value is negative indicating that the graph is skewed to the left. Furthermore, the range between the maximum and the minimum values were \$46,000 for income, 6 for household size and \$3,814 amount charged. This shows that the mean amount charged is almost similar to the range charged.

1. Regression using either annual income or household size as independent variable

The two regression equations consider how annual income and household size as independent variables predicts annual credit card charges.

 Independent variables (N=50) Annual household income (Eq.1) Household size (Eq.2) Constant X Variable 1 Adjusted R2 Standard Error F-change Sig. level

From the first equation above, where annual income is the independent variable, 39.7% (R2) of the variability in credit card charges is explained by the annual household income. Moreover, F-change of 31.72 shows a good fit for the data at p=0.01.

As shown in the equation 2, 56.6% of the variability in credit card charges is explained by the household size. The F-change value of 62.80 shows a very good fit for the data.

From the two equations, household size is a better predictor of annual credit card charges because it has higher variability score and good fit for the data under normal probability plots.

1. Regression using both annual income and household size as independent variables

When household size and annual income are used as independent variables to predict annual credit card charges, it provides the following equation.

 Dependent variable: Annual credit card charges Constant 1305.03**(0.000) Household size 356.34**(0.000) Annual income 33.12**(0.000) Adjusted R2 F-change Sig. level

As shown in the table above, household size (356.34**; p=0.01) and annual income (33.12**; p=0.01) were significant at p=0.01 level and influenced the consumption of credit cards among users. F-change of 111.07 shows almost a perfect fit for the data as shown in the graph below.

This shows that annual income and household size positively impact on annual credit card usage.

4. Predicted annual credit card charges

From the question above, the equation generated is;

Y = α + β1X1 + β2X2…………………………………………………………………………………(i)

Where α: Constant

β1 and β2: Beta coefficients

X1 and X2: Variables in the study

It is provided that the household size is 3 while the annual income is \$40,000.

Annual credit card charge = 1305.03 + 356.34(3) + 33.12 (40)

= 1305.03 + 1069.02 + 1324.8

= 3698.85

The predicted annual credit card charges for the given annual income and household size is \$3698.85.

1. Influence of other independent variables

Besides, annual income and household size, two other variables that could affect the consumption of credit cards are age and gender. With these variables added into the existing model, it would change the variability of the dependent variable and its normal probability plots. Moreover, it will be helpful to increase the number of independent variables because it reduces the standard error and increases the probability of knowing the best predictors of annual credit card usage. For example, age could be a better predictor compared to household size and was not considered in the equation. This is because young consumers tend to use credit card to do their online shopping more than their older counterparts. As well, female consumers tend to shop for clothes, handbags and cosmetics and in turn use credit cards more than males.

Task 02: In the file (Holmes Institute-Assignment 02)

1. Descriptive statistics –Good healthy Individuals

 Florida New York North Carolina Standard Error 0.478347 0.492042 0.634429 Standard Deviation 2.139233 2.200478 2.837252 Sample Variance 4.576316 4.842105 Kurtosis -1.06219 0.626432 -0.90493 Skewness -0.27356 0.625687 -0.05619

-Sample with chronic health conditions (Over 65 years)

 Standard Error 0.708965 0.923024 0.658847 Standard Deviation 3.170589 2.946452 Sample Variance 10.05263 17.03947 8.681579 Kurtosis -0.03014 -0.59205 Skewness 0.280721 0.525352 -0.04173

From the table above, the mean depression score of healthy individuals (μ=8) and those suffering from depression was (μ=15.25) and was higher in New York. The most recurring number of depression score of 17 is found in Florida while the recurring number of healthy individuals is in New York scoring (8) and North Carolina (8). The standard deviation is higher for people suffering depression compared to healthy individuals from the three states. Except New York, Kurtosis values for Florida and North Caroline are negative indicating that flat peaks and less heavily tailed closer to the mean. On the contrary, Kurtosis for the people with depression show negative values indicating that flat peaks and less heavily tailed closer to the mean for the three states. The skewness value for healthy individuals is positive in North Carolina and New York indicating that the graph is skewed to the right. However, the value is negative for people with depression in North Carolina indicating that the graph is skewed to the left. The range of depression is score is low in North Carolina for people with depression and lower in Florida for healthy individuals.

1. Hypothesis testing using analysis of variance

The hypothesis that can be formulated from the two datasets is;

H11: There is significant difference in depression scores between geographic location and depression for individuals

H10: There is no significant difference in depression scores between geographic location and depression for individuals

From the table (1) above, the sample variance for locations with healthy individuals was lower in Florida and New York compared to the scores of people with depression from the two states. On the contrary, the sample variance for depression scores was relatively minimal for healthy and people with depression in North Carolina. Therefore, we accept the null hypothesis that there is no significant difference in depression scores between geographic location and depression for individuals.

1. Inferences about individual treatment means

The following hypothesis sought to establish a relationship between depression score and age of individuals within the three states.

H11: There is a relationship between the depression scores and prevalence chronic health conditions within the three states

H11: There is no relationship between the depression scores and the prevalence chronic health conditions within the three states

The study found that people who were65 years of age or older and had a chronic health condition such as arthritis, hypertension, and/or heart ailment had higher mean values (Florida: 14.5, New York: 15.25, and North Carolina: 13.95) compared to healthy individuals. The healthy individuals showed almost half the scores. We reject the null hypothesis and accept the alternative hypothesis that there is a relationship between the depression scores and the prevalence of chronic health conditions in Florida, New York and North Carolina.