Business Analytics Essay Example
- Category:Business
- Document type:Math Problem
- Level:Masters
- Page:3
- Words:2059
Business Economics 20
BUSINESS ECONOMICS
Business Economics
-
Approximately 1.65 million high school students take the Scholastic Aptitude Test (SAT) each year, and nearly 80 percent of the college and universities without open admission policies use SAT scores in making admission decisions. The current version of the SAT includes three parts: reading comprehension, mathematics, and writing.
A perfect combined score for all three parts is 2400. A sample of SAT scores for the combined three-part SAT are as follows:
-
Show a frequency distribution and histogram. Begin with the first bin starting at 800, and use a bin width of 200.
Frequency |
|
0 |
|
-
Comment on the shape of the distribution.
The shape of the histogram is symmetrical in that if it was to be divided into halves from the 1600 mark, the two parts would be equal. Additionally, it has seven peaks which decrease uniformly from the 1600 mark either way.
-
What other observations can be made about the SAT scores based on the tabular and graphical summaries?
Most of students score a combined score of between 1400 and 1800. Furthermore, there are few students scoring a combined score of less than or equal to 1000. Similarly, only few students score more than 2000 in all the three combined areas.
(2+3+5 = 10 marks)
The following 20 observations are for two quantitative variables, x and y.
Observation |
Observation |
||||
-
Create a scatter chart for these 20 observations.
-
Fit a linear trendline to the 20 observations. What can you say about the relationship between the two quantitative variables?
The relationship between X and Y variables for the 20 observations is inverse. This means that an increase in value of Y leads to a decrease in values of X and vice versa is true.
(5+5 = 10 marks)
Use the data file DemoKTCfile to conduct the following analysis:
-
Use k-means clustering with a value of k = 3 to cluster based on the Age, Income, and Children variables.
17524.672678 |
||
30101.739344 |
||
16626.400000 |
||
19324.588525 |
||
49559.488525 |
||
37926.600000 |
||
8899.070000 |
||
25004.600000 |
||
25341.300000 |
||
24266.100000 |
||
59869.900000 |
||
26710.800000 |
||
15779.800000 |
||
55270.700000 |
||
19510.600000 |
||
22380.100000 |
||
17766.800000 |
||
41062.000000 |
||
26971.200000 |
||
22553.800000 |
||
57941.700000 |
||
16547.300000 |
||
38500.600000 |
||
15565.800000 |
||
12662.300000 |
||
41090.000000 |
||
20854.700000 |
||
20153.000000 |
||
29398.100000 |
||
24331.100000 |
-
Repeat the k-means clustering for values of k=2, 4, 5.
-
How many clusters do you recommend? Why?
Only two clusters can be recommended since the number of calculation increases very fast as the cluster increases which may lead to errors in generation iterations. Given the large data in this case, two clusters are appropriate.
(8+ 6+6 = 20 marks)
A sociologist was hired by a large city hospital to investigate the relationship between the number of the unauthorized days that employees are absent per year, the distance (miles) between home and work for the employees and the level years of employees. A sample of 10 employees was chosen, and the following data were collected.
Distance to work (miles) |
No. of Days Absent |
Level years of employed |
Long-Career |
||
Mid-Career |
||
Long-Career |
||
Long-Career |
||
Mid-Career |
||
Mid-Career |
||
Mid-Career |
||
-
Develop a scatter chart for these data (consider Distance to work as independent variable and number of days absent as dependent variable). Does a linear relationship appear reasonable? Explain.
The linear relationship can be explained as follows: Y=-0.344X + 8.097. The relationship is reasonable. For instance assuming that someone stays 1 mile way from work, the number that he/she will be absent as per the equation are: (1*-0.344+8.097)= 7.75 (close to 8 days). If the person stays 10 miles way the days that he/she will be absent will be: )10*-0.344+8.097)= 4.65(close to 5 days). This is reasonable as be the day on the table which implies that as people stay close to the place of work have more absent days.
-
Use the data to develop an estimated regression equation that could be used to predict the number of days absent given the distance to work and level years of employed. What is the estimated regression model?
The estimated regression equation is Y=-0.344X+8.097
-
How much of the variation in the sample values of number of days absent does the model you estimated in part (b) explain?
The coefficient of determination, r2 is used to measure the variation of one variable in relationship to the other. The coefficient of determination, r2 for this case is 0.710854. The value is close to 1 (perfect correlation). This indicates that it is possible to predict the number of days a person can be absent. Furthermore, the F-value is 19.66767 and p-value is 0.002183. If the p value was to be zero, it would indicate that there is no relationship between distance and absenteeism but as the value gets close to zero, it is an indication that the two variables are related. Thus, the values indicate that there is a close relationship between distance of workplace and absenteeism.
(5+5+5 = 15 marks)
Dixie Showtime Movie Theaters, Inc., owns and operates a chain of cinemas in several markets in the southern United States. The owners would like to estimate weekly gross revenue as a function of advertising expenditures. Data for a sample of eight markets for a recent week follow:
Weekly Gross Revenue ($100s) |
Television Advertising |
Newspaper Advertising |
|
Shreveport |
|||
Birmingham |
|||
Little Rock |
|||
New Orleans |
|||
Baton Rouge |
-
Develop an estimated regression equation with the amount of television advertising as the independent variable. How many of the variation in the sample values of weekly gross revenue does the model explain?
SUMMARY OUTPUT |
|||||||||
Regression Statistics |
|||||||||
Multiple R |
0.656478 |
||||||||
0.430963 |
|||||||||
Adjusted R Square |
0.336124 |
||||||||
Standard Error |
53.78118 |
||||||||
Observations |
|||||||||
Significance F |
|||||||||
Regression |
13143.51 |
13143.51 |
0.077029 |
||||||
Residual |
17354.49 |
2892.415 |
|||||||
Coefficients |
Standard Error |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
||||
Intercept |
-18.3625 |
72.76012 |
-0.25237 |
0.809177 |
159.6751 |
159.6751 |
|||
X Variable 1 |
33.36213 |
2.131696 |
0.077029 |
-4.93328 |
71.65753 |
-4.93328 |
71.65753 |
The R2 value for the equation is 0.43 which is not a very good fit. This value indicates a variation of 43% the amount of television adverts and weekly gross revenue. The variation is less than 50% which makes it hard to predict the relationship between weekly gross income and television adverts
-
Develop an estimated regression equation with both television advertising and newspaper advertising as the independent variables. How much of the variation in the sample value of weekly gross revenue does the model explain?
SUMMARY OUTPUT |
||||||||
Regression Statistics |
||||||||
Multiple R |
0.951235 |
|||||||
0.904847 |
||||||||
Adjusted R Square |
0.866786 |
|||||||
Standard Error |
24.09135 |
|||||||
Observations |
||||||||
Significance F |
||||||||
Regression |
27596.03 |
13798.02 |
23.77356 |
0.002793 |
||||
Residual |
2901.966 |
580.3933 |
||||||
Coefficients |
Standard Error |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
|||
Intercept |
-41.4073 |
32.91854 |
-1.25787 |
0.263985 |
-126.027 |
43.21253 |
-126.027 |
43.21253 |
X Variable 1 |
21.93773 |
4.396237 |
4.990116 |
10.63685 |
33.23862 |
10.63685 |
33.23862 |
|
X Variable 2 |
19.00402 |
7.578148 |
2.507739 |
0.053977 |
-0.47623 |
38.48427 |
-0.47623 |
38.48427 |
(5+5 = 10 marks)
The value of R2 in this case is 0.90 which is very good fit. The 90% variation indicates that there is a perfect correlation between television and newspaper adverts with weekly gross income. It also means that model almost all the variability of television and newspaper adverts around the mean.
An internet provider company in Australia is interested in identifying the reason for individuals who are still undecided in buying the new NBN service of the company. The file NBN-service contains data on a sample of customers with track variables.
Create a standard partition of the data with all tracked variables and 40% of observations in the training set, 35% in the validation set, and 25% in the test set. Fit a single classification tree using contract duration (month), last plan, bonus data (GB), usage (GB), regular payment, have modem, and unlimited service as input variables and undecided as the output variable. In step 2 of XLMiner’s classification tree procedure, be sure to Normalize Input Data and to set the Minimum #records in a terminal node to 250. Generate the full tree and best pruned tree.
-
From the CT-Output worksheet, what is the overall error rate of the full tree on the validation set?
After standard partitioning, there were 7 classes misclassified in the last plan data. This resulted to a percentage error of 3.96
-
Consider a 35-month contract customer who has selected plan 10 as his last plan, gifted bonus data of 50 GB, with usage of 137 GB, with regular payment, owns the modem and without unlimited service. Using the CT_PruneTree worksheet, does the best-pruned tree classify this observation as undecided?
In the red line connecting origin to the blue line draws the number of cases with the variables. There is a great area between the lift curve and the baseline which indicates a better model meaning that the observations are undecided.
-
For the default cut-off value of 0.5, what are the overall error rate, class 1 error rate, and class 0 error rate of the best-pruned tree on the test set?
The ROC curves plots true positive rates (TPR) against false positive rates (FPR) with the cut off values moving from 0 to1. With the cut off of 0.4, the Areas under the curve (AUC) is very close to 1 indicating that the model is good fit. Class 1 and class 0 errors rates are therefore very minimal.
(5+10+5=20 marks)
-
(Answer this question or Question 8, if must not answer both, if you answer this question I don’t mark question 8 for you).
The university of Cincinnati Center for business Analytics is an outreach center that collaborates with industry partners on applied research and continuing education in business analytics. One of the programs offered by the center is a quarterly Business Intelligence Symposium. Each symposium features three speakers on the real-world use of analytics.
Each corporates member of the center (there are currently 10) receives five free seats to each symposium. Nonmembers wishing to attend must pay $75 per person. each attendee receives breakfast, lunch, and free parking. The following are the costs incurred for putting on this event:
-
Rental cost for the auditorium
Registration processing
$8.50 per person
Speaker costs
[email protected]$800 = $2400
Continental breakfast
$4.00 per person
$7.00 per person
$5.00 per person
-
Build a spreadsheet model that calculates a profit or loss based on the number of nonmember registrants.
Expenses |
Price per unit |
||
Rental for auditorium |
|||
Regitstration process |
|||
Speaker cost |
$2,400.00 |
||
Continental breakfast |
|||
Total expenses |
$3,775.00 |
||
Attendance fee |
$3,000.00 |
||
Total Income |
$3,000.00 |
||
-$775.00 |
Assuming that all the seats were occupied, the company will have a total of 50 attendants. 10 of the attendants will be corporate members who will not pay attendant fee. The remaining 40 attendants will pay $75 which is the only income the company has. After subtracting expenses from the income, the company will make a loss of $755
-
Use Goal Seek to find the number of nonmember registrants that will make the event break even.
Expenses |
Price per unit |
||
Rental for auditorium |
|||
Regitstration process |
|||
Speaker cost |
$2,400.00 |
||
Continental breakfast |
|||
Total expenses |
$3,775.00 |
||
Attendance fee |
100.6667 |
$7,550.00 |
|
Total Income |
$7,550.00 |
||
$3,775.00 |
For the company to experience break even (no profit or loss) the total number of attendants must be 101.
(7+8=15 marks)
-
(Answer this question or Question 7, if must not answer both, if you have answered question 7 I don’t mark this question).
Great Southern is a 117-room hotel located near the Convention and Exhibition Centre in Melbourne. The Meetings & Events Australia Ltd has planned its annual exhibition in Melbourne for the last weekend in April. Great Southern has agreed to make at least one-third of its rooms available for exhibition attendees at a special exhibition rate in order to be listed as a recommended hotel for the exhibition. Although the majority of attendees at the annual exhibition typically request a Saturday night package, some of them may choose a Friday and Saturday two-night package reservation. Travellers not attending the exhibition may also request the same day reservations as well as Friday night only one. Thus, 5 kinds of reservations are likely and have the following costs and expected demands:
Ordinary (O) |
Exhibition (E) |
|||
Friday night only (F) |
||||
Saturday night only (S) |
||||
Friday & Saturday Two-night package(FS) |
-
How many rooms Great Southern can determine to make available for each type of reservation in order to maximise total profit?
-
Suppose that one week before the exhibition, the number of Exhibition travellers/ Friday & Saturday two-night package rooms that were made available sell out. If an exhibition attendee calls and requests a Friday & Saturday two-night package room, should Great Southern accept this booking? If an ordinary traveller demands this option simultaneously, which one would be better to select?
(7+8=15 marks)
TOTAL MARKS= 100