Business Analytics Essay Example
 Category:Business
 Document type:Math Problem
 Level:Masters
 Page:3
 Words:2059
Business Economics 20
BUSINESS ECONOMICS
Business Economics

Approximately 1.65 million high school students take the Scholastic Aptitude Test (SAT) each year, and nearly 80 percent of the college and universities without open admission policies use SAT scores in making admission decisions. The current version of the SAT includes three parts: reading comprehension, mathematics, and writing.
A perfect combined score for all three parts is 2400. A sample of SAT scores for the combined threepart SAT are as follows:

Show a frequency distribution and histogram. Begin with the first bin starting at 800, and use a bin width of 200.
Frequency 

0 


Comment on the shape of the distribution.
The shape of the histogram is symmetrical in that if it was to be divided into halves from the 1600 mark, the two parts would be equal. Additionally, it has seven peaks which decrease uniformly from the 1600 mark either way.

What other observations can be made about the SAT scores based on the tabular and graphical summaries?
Most of students score a combined score of between 1400 and 1800. Furthermore, there are few students scoring a combined score of less than or equal to 1000. Similarly, only few students score more than 2000 in all the three combined areas.
(2+3+5 = 10 marks)
The following 20 observations are for two quantitative variables, x and y.
Observation 
Observation 


Create a scatter chart for these 20 observations.

Fit a linear trendline to the 20 observations. What can you say about the relationship between the two quantitative variables?
The relationship between X and Y variables for the 20 observations is inverse. This means that an increase in value of Y leads to a decrease in values of X and vice versa is true.
(5+5 = 10 marks)
Use the data file DemoKTCfile to conduct the following analysis:

Use kmeans clustering with a value of k = 3 to cluster based on the Age, Income, and Children variables.
17524.672678 

30101.739344 

16626.400000 

19324.588525 

49559.488525 

37926.600000 

8899.070000 

25004.600000 

25341.300000 

24266.100000 

59869.900000 

26710.800000 

15779.800000 

55270.700000 

19510.600000 

22380.100000 

17766.800000 

41062.000000 

26971.200000 

22553.800000 

57941.700000 

16547.300000 

38500.600000 

15565.800000 

12662.300000 

41090.000000 

20854.700000 

20153.000000 

29398.100000 

24331.100000 

Repeat the kmeans clustering for values of k=2, 4, 5.

How many clusters do you recommend? Why?
Only two clusters can be recommended since the number of calculation increases very fast as the cluster increases which may lead to errors in generation iterations. Given the large data in this case, two clusters are appropriate.
(8+ 6+6 = 20 marks)
A sociologist was hired by a large city hospital to investigate the relationship between the number of the unauthorized days that employees are absent per year, the distance (miles) between home and work for the employees and the level years of employees. A sample of 10 employees was chosen, and the following data were collected.
Distance to work (miles) 
No. of Days Absent 
Level years of employed 
LongCareer 

MidCareer 

LongCareer 

LongCareer 

MidCareer 

MidCareer 

MidCareer 


Develop a scatter chart for these data (consider Distance to work as independent variable and number of days absent as dependent variable). Does a linear relationship appear reasonable? Explain.
The linear relationship can be explained as follows: Y=0.344X + 8.097. The relationship is reasonable. For instance assuming that someone stays 1 mile way from work, the number that he/she will be absent as per the equation are: (1*0.344+8.097)= 7.75 (close to 8 days). If the person stays 10 miles way the days that he/she will be absent will be: )10*0.344+8.097)= 4.65(close to 5 days). This is reasonable as be the day on the table which implies that as people stay close to the place of work have more absent days.

Use the data to develop an estimated regression equation that could be used to predict the number of days absent given the distance to work and level years of employed. What is the estimated regression model?
The estimated regression equation is Y=0.344X+8.097

How much of the variation in the sample values of number of days absent does the model you estimated in part (b) explain?
The coefficient of determination, r2 is used to measure the variation of one variable in relationship to the other. The coefficient of determination, r2 for this case is 0.710854. The value is close to 1 (perfect correlation). This indicates that it is possible to predict the number of days a person can be absent. Furthermore, the Fvalue is 19.66767 and pvalue is 0.002183. If the p value was to be zero, it would indicate that there is no relationship between distance and absenteeism but as the value gets close to zero, it is an indication that the two variables are related. Thus, the values indicate that there is a close relationship between distance of workplace and absenteeism.
(5+5+5 = 15 marks)
Dixie Showtime Movie Theaters, Inc., owns and operates a chain of cinemas in several markets in the southern United States. The owners would like to estimate weekly gross revenue as a function of advertising expenditures. Data for a sample of eight markets for a recent week follow:
Weekly Gross Revenue ($100s) 
Television Advertising 
Newspaper Advertising 

Shreveport 

Birmingham 

Little Rock 

New Orleans 

Baton Rouge 

Develop an estimated regression equation with the amount of television advertising as the independent variable. How many of the variation in the sample values of weekly gross revenue does the model explain?
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.656478 

0.430963 

Adjusted R Square 
0.336124 

Standard Error 
53.78118 

Observations 

Significance F 

Regression 
13143.51 
13143.51 
0.077029 

Residual 
17354.49 
2892.415 

Coefficients 
Standard Error 
Lower 95% 
Upper 95% 
Lower 95.0% 
Upper 95.0% 

Intercept 
18.3625 
72.76012 
0.25237 
0.809177 
159.6751 
159.6751 

X Variable 1 
33.36213 
2.131696 
0.077029 
4.93328 
71.65753 
4.93328 
71.65753 
The R2 value for the equation is 0.43 which is not a very good fit. This value indicates a variation of 43% the amount of television adverts and weekly gross revenue. The variation is less than 50% which makes it hard to predict the relationship between weekly gross income and television adverts

Develop an estimated regression equation with both television advertising and newspaper advertising as the independent variables. How much of the variation in the sample value of weekly gross revenue does the model explain?
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.951235 

0.904847 

Adjusted R Square 
0.866786 

Standard Error 
24.09135 

Observations 

Significance F 

Regression 
27596.03 
13798.02 
23.77356 
0.002793 

Residual 
2901.966 
580.3933 

Coefficients 
Standard Error 
Lower 95% 
Upper 95% 
Lower 95.0% 
Upper 95.0% 

Intercept 
41.4073 
32.91854 
1.25787 
0.263985 
126.027 
43.21253 
126.027 
43.21253 
X Variable 1 
21.93773 
4.396237 
4.990116 
10.63685 
33.23862 
10.63685 
33.23862 

X Variable 2 
19.00402 
7.578148 
2.507739 
0.053977 
0.47623 
38.48427 
0.47623 
38.48427 
(5+5 = 10 marks)
The value of R2 in this case is 0.90 which is very good fit. The 90% variation indicates that there is a perfect correlation between television and newspaper adverts with weekly gross income. It also means that model almost all the variability of television and newspaper adverts around the mean.
An internet provider company in Australia is interested in identifying the reason for individuals who are still undecided in buying the new NBN service of the company. The file NBNservice contains data on a sample of customers with track variables.
Create a standard partition of the data with all tracked variables and 40% of observations in the training set, 35% in the validation set, and 25% in the test set. Fit a single classification tree using contract duration (month), last plan, bonus data (GB), usage (GB), regular payment, have modem, and unlimited service as input variables and undecided as the output variable. In step 2 of XLMiner’s classification tree procedure, be sure to Normalize Input Data and to set the Minimum #records in a terminal node to 250. Generate the full tree and best pruned tree.

From the CTOutput worksheet, what is the overall error rate of the full tree on the validation set?
After standard partitioning, there were 7 classes misclassified in the last plan data. This resulted to a percentage error of 3.96

Consider a 35month contract customer who has selected plan 10 as his last plan, gifted bonus data of 50 GB, with usage of 137 GB, with regular payment, owns the modem and without unlimited service. Using the CT_PruneTree worksheet, does the bestpruned tree classify this observation as undecided?
In the red line connecting origin to the blue line draws the number of cases with the variables. There is a great area between the lift curve and the baseline which indicates a better model meaning that the observations are undecided.

For the default cutoff value of 0.5, what are the overall error rate, class 1 error rate, and class 0 error rate of the bestpruned tree on the test set?
The ROC curves plots true positive rates (TPR) against false positive rates (FPR) with the cut off values moving from 0 to1. With the cut off of 0.4, the Areas under the curve (AUC) is very close to 1 indicating that the model is good fit. Class 1 and class 0 errors rates are therefore very minimal.
(5+10+5=20 marks)

(Answer this question or Question 8, if must not answer both, if you answer this question I don’t mark question 8 for you).
The university of Cincinnati Center for business Analytics is an outreach center that collaborates with industry partners on applied research and continuing education in business analytics. One of the programs offered by the center is a quarterly Business Intelligence Symposium. Each symposium features three speakers on the realworld use of analytics.
Each corporates member of the center (there are currently 10) receives five free seats to each symposium. Nonmembers wishing to attend must pay $75 per person. each attendee receives breakfast, lunch, and free parking. The following are the costs incurred for putting on this event:

Rental cost for the auditorium
Registration processing
$8.50 per person
Speaker costs
[email protected]$800 = $2400
Continental breakfast
$4.00 per person
$7.00 per person
$5.00 per person

Build a spreadsheet model that calculates a profit or loss based on the number of nonmember registrants.
Expenses 
Price per unit 

Rental for auditorium 

Regitstration process 

Speaker cost 
$2,400.00 

Continental breakfast 

Total expenses 
$3,775.00 

Attendance fee 
$3,000.00 

Total Income 
$3,000.00 

$775.00 
Assuming that all the seats were occupied, the company will have a total of 50 attendants. 10 of the attendants will be corporate members who will not pay attendant fee. The remaining 40 attendants will pay $75 which is the only income the company has. After subtracting expenses from the income, the company will make a loss of $755

Use Goal Seek to find the number of nonmember registrants that will make the event break even.
Expenses 
Price per unit 

Rental for auditorium 

Regitstration process 

Speaker cost 
$2,400.00 

Continental breakfast 

Total expenses 
$3,775.00 

Attendance fee 
100.6667 
$7,550.00 

Total Income 
$7,550.00 

$3,775.00 
For the company to experience break even (no profit or loss) the total number of attendants must be 101.
(7+8=15 marks)

(Answer this question or Question 7, if must not answer both, if you have answered question 7 I don’t mark this question).
Great Southern is a 117room hotel located near the Convention and Exhibition Centre in Melbourne. The Meetings & Events Australia Ltd has planned its annual exhibition in Melbourne for the last weekend in April. Great Southern has agreed to make at least onethird of its rooms available for exhibition attendees at a special exhibition rate in order to be listed as a recommended hotel for the exhibition. Although the majority of attendees at the annual exhibition typically request a Saturday night package, some of them may choose a Friday and Saturday twonight package reservation. Travellers not attending the exhibition may also request the same day reservations as well as Friday night only one. Thus, 5 kinds of reservations are likely and have the following costs and expected demands:
Ordinary (O) 
Exhibition (E) 

Friday night only (F) 

Saturday night only (S) 

Friday & Saturday Twonight package(FS) 

How many rooms Great Southern can determine to make available for each type of reservation in order to maximise total profit?

Suppose that one week before the exhibition, the number of Exhibition travellers/ Friday & Saturday twonight package rooms that were made available sell out. If an exhibition attendee calls and requests a Friday & Saturday twonight package room, should Great Southern accept this booking? If an ordinary traveller demands this option simultaneously, which one would be better to select?
(7+8=15 marks)
TOTAL MARKS= 100