# Statistic Essay Example

• Category:
Statistics
• Document type:
Math Problem
• Level:
Masters
• Page:
1
• Words:
668

STATISTCS

 histogram Frequency 0 0 0

Question 1

b) The shape of the distribution shows a normal distribution for the students’ marks. The test scores are normally distributed: there are an equal number of students scoring less than the medium value that there are scoring a higher.

c). SAT scores of the students as seen from the frequency distribution and the histograms show that most students scored between 1400-1600 out of the possible 2400. Further the highest test score was 2200 scored by only one student.

Question 2

b). the scatter graph of the X and the Y observations reveal that there is a definite relationship between the two variables. A linear trend line drawn on the observations show that the likely relationship is a negative correlation: as X decreases, Y increases.

Question 3

1. The following output was obtained from the k means clustering of k=3:

1. When the clusters was changed to k=2,4,5 the results were as follows:

When k=2:

When k=4:

When k=5:

1. The clusters can either be 2,3,4,5 but I would recommend using cluster 3 for the data partition. The cluster 3 provides the least distance of the variables from the centroids. This implies that using k=3 results in maximizing efficiency while the errors are minimized.

Question 4

1. When a scatter graph is drawn of the relationship between distance from work as independent variable and number of days absent as the dependent variable, the graph shows that there is indeed a relationship between the two variables. Plotting a line of best fit show that only two observations were captured in the line but the other observations are not far from the line. Hence a linear relationship is possible between the two observations

1. A regression analysis of the number of days absent given the level of years employed and the distance from work reveal a regression model of the nature:

1. The variation of the absent days explained by the regression model is evidenced in the R squared measure or the F statistic. The regression analysis show the
value to be 0.71: that is 71% of the variations in dependent variable in the regression model are explained by the independent variable.

Question 5

1. A regression analysis was taken with the television advertising as the independent variable shows the following results:

Therefore the regression equation would take the form:

The number of variations in the weekly gross revenue explained by the television advertising is 43%: the R squared for the regression analysis is 0.43.

1. when the regression analysis is carried out with both newspaper and television advertising as the independent variables, the statistic show:

And the regression equation becomes:

The amount of variations in the weekly gross revenue explained by television and newspaper advertising is captured in the value of R squared.
Therefore the television and newspaper advertising explains about 90% of the variation in the weekly revenue for Movie theaters.

Question 6

1. the overall error rate of the full tree validation set using training data set has been found to be:

 Error Report 15.86842
2. 35-month contract customer who has selected plan 10 as his last plan, gifted bonus data of 50 GB, with usage of 137 GB, with regular payment, owns the modem and without unlimited service, the observation is classified as terminal 1, hence decided.Using the prune tree the observation of

3. The error report for the best pruned on the tree test show that:

Question 7

The profit and loss is typically the amount of money left or due after all costs have been covered from the revenues.

Assuming there are X nonmembers who have registered to attend the conference then the profit model would be:

1. Using Goal seek, the results show that the company would break even if there was a total of 50 nonmember attendees in the conferences. The results of the goal seek, by setting the profit to zero by changing the number of registrants, show that:

 price per person number of registrants 50.4950495 cost per person fixed costs 3787.128713 variable costs 1237.128713 0