 Home
 Statistics
 Descriptive statistical methods
Descriptive statistical methods Essay Example
 Category:Statistics
 Document type:Coursework
 Level:Masters
 Page:2
 Words:1237
7STATISTICS
Statistics

Descriptive statistical methods
The descriptive statistical data obtained from a sample of 50 consumers are as shown in the table below.
Table 1: Descriptive statistical data
Descriptives 
Income($1000s) 
Household size 
Amount charged($) 
Standard Error 
2.057786 
132.0234 

Standard Deviation 
14.55074 
1.738989 
933.5463 
Sample Variance 
211.7241 
3.024082 
871508.7 
Kurtosis 
1.24772 
0.72281 
0.74248 
Skewness 
0.095856 
0.527896 
0.12886 
From the table above, the mean income of the consumers is ($43,480; SE=2.05) and their mean household size is (3.42; SE=0.24) while the mean charges by credit card users stood at ($3963.86; SE=132.02). The mode for household size (2) and amount charged ($4090) were higher than the mode for income ($42,000). Similarly, the standard deviation or deviation from the mean for income was $14,550 as household size stood at 1.73 while deviation from the mean for amount charged was $933.54. These values were obtained by taking the square root of sample variance. Kurtosis values are negative indicating that flat peaks and less heavily tailed closer to the mean. Meanwhile, the skewness value is negative indicating that the graph is skewed to the left. Furthermore, the range between the maximum and the minimum values were $46,000 for income, 6 for household size and $3,814 amount charged. This shows that the mean amount charged is almost similar to the range charged.

Regression using either annual income or household size as independent variable
The two regression equations consider how annual income and household size as independent variables predicts annual credit card charges.
Independent variables (N=50) 

Annual household income (Eq.1) 
Household size (Eq.2) 

Constant 

X Variable 1 

Adjusted R^{2} 

Standard Error 

Fchange 

Sig. level 
From the first equation above, where annual income is the independent variable, 39.7% (R^{2}) of the variability in credit card charges is explained by the annual household income. Moreover, Fchange of 31.72 shows a good fit for the data at p=0.01.
As shown in the equation 2, 56.6% of the variability in credit card charges is explained by the household size. The Fchange value of 62.80 shows a very good fit for the data.
From the two equations, household size is a better predictor of annual credit card charges because it has higher variability score and good fit for the data under normal probability plots.

Regression using both annual income and household size as independent variables
When household size and annual income are used as independent variables to predict annual credit card charges, it provides the following equation.
Dependent variable: Annual credit card charges 

Constant 
1305.03**(0.000) 
Household size 
356.34**(0.000) 
Annual income 
33.12**(0.000) 
Adjusted R^{2} 

Fchange 

Sig. level 
As shown in the table above, household size (356.34**; p=0.01) and annual income (33.12**; p=0.01) were significant at p=0.01 level and influenced the consumption of credit cards among users. Fchange of 111.07 shows almost a perfect fit for the data as shown in the graph below.
This shows that annual income and household size positively impact on annual credit card usage.
4. Predicted annual credit card charges
From the question above, the equation generated is;
Y = α + β_{1}X_{1} + β_{2}X_{2}…………………………………………………………………………………(i)
Where α: Constant
β_{1} and β_{2}: Beta coefficients
X_{1} and X_{2}: Variables in the study
It is provided that the household size is 3 while the annual income is $40,000.
Annual credit card charge = 1305.03 + 356.34(3) + 33.12 (40)
= 1305.03 + 1069.02 + 1324.8
= 3698.85
The predicted annual credit card charges for the given annual income and household size is $3698.85.

Influence of other independent variables
Besides, annual income and household size, two other variables that could affect the consumption of credit cards are age and gender. With these variables added into the existing model, it would change the variability of the dependent variable and its normal probability plots. Moreover, it will be helpful to increase the number of independent variables because it reduces the standard error and increases the probability of knowing the best predictors of annual credit card usage. For example, age could be a better predictor compared to household size and was not considered in the equation. This is because young consumers tend to use credit card to do their online shopping more than their older counterparts. As well, female consumers tend to shop for clothes, handbags and cosmetics and in turn use credit cards more than males.
Task 02: In the file (Holmes InstituteAssignment 02)

Descriptive statistics –Good healthy Individuals
Florida 
New York 
North Carolina 

Standard Error 
0.478347 
0.492042 
0.634429 
Standard Deviation 
2.139233 
2.200478 
2.837252 
Sample Variance 
4.576316 
4.842105 

Kurtosis 
1.06219 
0.626432 
0.90493 
Skewness 
0.27356 
0.625687 
0.05619 
Sample with chronic health conditions (Over 65 years)
Standard Error 
0.708965 
0.923024 
0.658847 
Standard Deviation 
3.170589 
2.946452 

Sample Variance 
10.05263 
17.03947 
8.681579 
Kurtosis 
0.03014 
0.59205 

Skewness 
0.280721 
0.525352 
0.04173 
From the table above, the mean depression score of healthy individuals (μ=8) and those suffering from depression was (μ=15.25) and was higher in New York. The most recurring number of depression score of 17 is found in Florida while the recurring number of healthy individuals is in New York scoring (8) and North Carolina (8). The standard deviation is higher for people suffering depression compared to healthy individuals from the three states. Except New York, Kurtosis values for Florida and North Caroline are negative indicating that flat peaks and less heavily tailed closer to the mean. On the contrary, Kurtosis for the people with depression show negative values indicating that flat peaks and less heavily tailed closer to the mean for the three states. The skewness value for healthy individuals is positive in North Carolina and New York indicating that the graph is skewed to the right. However, the value is negative for people with depression in North Carolina indicating that the graph is skewed to the left. The range of depression is score is low in North Carolina for people with depression and lower in Florida for healthy individuals.

Hypothesis testing using analysis of variance
The hypothesis that can be formulated from the two datasets is;
H_{11}: There is significant difference in depression scores between geographic location and depression for individuals
H_{10}: There is no significant difference in depression scores between geographic location and depression for individuals
From the table (1) above, the sample variance for locations with healthy individuals was lower in Florida and New York compared to the scores of people with depression from the two states. On the contrary, the sample variance for depression scores was relatively minimal for healthy and people with depression in North Carolina. Therefore, we accept the null hypothesis that there is no significant difference in depression scores between geographic location and depression for individuals.

Inferences about individual treatment means
The following hypothesis sought to establish a relationship between depression score and age of individuals within the three states.
H_{11}: There is a relationship between the depression scores and prevalence chronic health conditions within the three states
H_{11}: There is no relationship between the depression scores and the prevalence chronic health conditions within the three states
The study found that people who were65 years of age or older and had a chronic health condition such as arthritis, hypertension, and/or heart ailment had higher mean values (Florida: 14.5, New York: 15.25, and North Carolina: 13.95) compared to healthy individuals. The healthy individuals showed almost half the scores. We reject the null hypothesis and accept the alternative hypothesis that there is a relationship between the depression scores and the prevalence of chronic health conditions in Florida, New York and North Carolina.