# Business Analytics Essay Example

• Category:
Business
• Document type:
Math Problem
• Level:
Masters
• Page:
3
• Words:
2059

Business Economics 20

BUSINESS ECONOMICS

Business Economics

1. Approximately 1.65 million high school students take the Scholastic Aptitude Test (SAT) each year, and nearly 80 percent of the college and universities without open admission policies use SAT scores in making admission decisions. The current version of the SAT includes three parts: reading comprehension, mathematics, and writing.

A perfect combined score for all three parts is 2400. A sample of SAT scores for the combined three-part SAT are as follows:

1. Show a frequency distribution and histogram. Begin with the first bin starting at 800, and use a bin width of 200.

 Frequency 0

1. Comment on the shape of the distribution.

The shape of the histogram is symmetrical in that if it was to be divided into halves from the 1600 mark, the two parts would be equal. Additionally, it has seven peaks which decrease uniformly from the 1600 mark either way.

1. What other observations can be made about the SAT scores based on the tabular and graphical summaries?

Most of students score a combined score of between 1400 and 1800. Furthermore, there are few students scoring a combined score of less than or equal to 1000. Similarly, only few students score more than 2000 in all the three combined areas.

(2+3+5 = 10 marks)

The following 20 observations are for two quantitative variables, x and y.

 Observation Observation
1. Create a scatter chart for these 20 observations.

1. Fit a linear trendline to the 20 observations. What can you say about the relationship between the two quantitative variables?

The relationship between X and Y variables for the 20 observations is inverse. This means that an increase in value of Y leads to a decrease in values of X and vice versa is true.

(5+5 = 10 marks)

Use the data file DemoKTCfile to conduct the following analysis:

1. Use k-means clustering with a value of k = 3 to cluster based on the Age, Income, and Children variables.

 17524.672678 30101.739344 16626.400000 19324.588525 49559.488525 37926.600000 8899.070000 25004.600000 25341.300000 24266.100000 59869.900000 26710.800000 15779.800000 55270.700000 19510.600000 22380.100000 17766.800000 41062.000000 26971.200000 22553.800000 57941.700000 16547.300000 38500.600000 15565.800000 12662.300000 41090.000000 20854.700000 20153.000000 29398.100000 24331.100000
1. Repeat the k-means clustering for values of k=2, 4, 5.

1. How many clusters do you recommend? Why?

Only two clusters can be recommended since the number of calculation increases very fast as the cluster increases which may lead to errors in generation iterations. Given the large data in this case, two clusters are appropriate.

(8+ 6+6 = 20 marks)

A sociologist was hired by a large city hospital to investigate the relationship between the number of the unauthorized days that employees are absent per year, the distance (miles) between home and work for the employees and the level years of employees. A sample of 10 employees was chosen, and the following data were collected.

 Distance to work (miles) No. of Days Absent Level years of employed Long-Career Mid-Career Long-Career Long-Career Mid-Career Mid-Career Mid-Career
1. Develop a scatter chart for these data (consider Distance to work as independent variable and number of days absent as dependent variable). Does a linear relationship appear reasonable? Explain.

The linear relationship can be explained as follows: Y=-0.344X + 8.097. The relationship is reasonable. For instance assuming that someone stays 1 mile way from work, the number that he/she will be absent as per the equation are: (1*-0.344+8.097)= 7.75 (close to 8 days). If the person stays 10 miles way the days that he/she will be absent will be: )10*-0.344+8.097)= 4.65(close to 5 days). This is reasonable as be the day on the table which implies that as people stay close to the place of work have more absent days.

1. Use the data to develop an estimated regression equation that could be used to predict the number of days absent given the distance to work and level years of employed. What is the estimated regression model?

The estimated regression equation is Y=-0.344X+8.097

1. How much of the variation in the sample values of number of days absent does the model you estimated in part (b) explain?

The coefficient of determination, r2 is used to measure the variation of one variable in relationship to the other. The coefficient of determination, r2 for this case is 0.710854. The value is close to 1 (perfect correlation). This indicates that it is possible to predict the number of days a person can be absent. Furthermore, the F-value is 19.66767 and p-value is 0.002183. If the p value was to be zero, it would indicate that there is no relationship between distance and absenteeism but as the value gets close to zero, it is an indication that the two variables are related. Thus, the values indicate that there is a close relationship between distance of workplace and absenteeism.

(5+5+5 = 15 marks)

Dixie Showtime Movie Theaters, Inc., owns and operates a chain of cinemas in several markets in the southern United States. The owners would like to estimate weekly gross revenue as a function of advertising expenditures. Data for a sample of eight markets for a recent week follow:

 Weekly Gross Revenue (\$100s) Television Advertising Newspaper Advertising Shreveport Birmingham Little Rock New Orleans Baton Rouge
1. Develop an estimated regression equation with the amount of television advertising as the independent variable. How many of the variation in the sample values of weekly gross revenue does the model explain?

 SUMMARY OUTPUT Regression Statistics Multiple R 0.656478 0.430963 Adjusted R Square 0.336124 Standard Error 53.78118 Observations Significance F Regression 13143.51 13143.51 0.077029 Residual 17354.49 2892.415 Coefficients Standard Error Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept -18.3625 72.76012 -0.25237 0.809177 159.6751 159.6751 X Variable 1 33.36213 2.131696 0.077029 -4.93328 71.65753 -4.93328 71.65753

The R2 value for the equation is 0.43 which is not a very good fit. This value indicates a variation of 43% the amount of television adverts and weekly gross revenue. The variation is less than 50% which makes it hard to predict the relationship between weekly gross income and television adverts

1. Develop an estimated regression equation with both television advertising and newspaper advertising as the independent variables. How much of the variation in the sample value of weekly gross revenue does the model explain?

 SUMMARY OUTPUT Regression Statistics Multiple R 0.951235 0.904847 Adjusted R Square 0.866786 Standard Error 24.09135 Observations Significance F Regression 27596.03 13798.02 23.77356 0.002793 Residual 2901.966 580.3933 Coefficients Standard Error Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept -41.4073 32.91854 -1.25787 0.263985 -126.027 43.21253 -126.027 43.21253 X Variable 1 21.93773 4.396237 4.990116 10.63685 33.23862 10.63685 33.23862 X Variable 2 19.00402 7.578148 2.507739 0.053977 -0.47623 38.48427 -0.47623 38.48427

(5+5 = 10 marks)

The value of R2 in this case is 0.90 which is very good fit. The 90% variation indicates that there is a perfect correlation between television and newspaper adverts with weekly gross income. It also means that model almost all the variability of television and newspaper adverts around the mean.

An internet provider company in Australia is interested in identifying the reason for individuals who are still undecided in buying the new NBN service of the company. The file NBN-service contains data on a sample of customers with track variables.

Create a standard partition of the data with all tracked variables and 40% of observations in the training set, 35% in the validation set, and 25% in the test set. Fit a single classification tree using contract duration (month), last plan, bonus data (GB), usage (GB), regular payment, have modem, and unlimited service as input variables and undecided as the output variable. In step 2 of XLMiner’s classification tree procedure, be sure to Normalize Input Data and to set the Minimum #records in a terminal node to 250. Generate the full tree and best pruned tree.

1. From the CT-Output worksheet, what is the overall error rate of the full tree on the validation set?

After standard partitioning, there were 7 classes misclassified in the last plan data. This resulted to a percentage error of 3.96

1. Consider a 35-month contract customer who has selected plan 10 as his last plan, gifted bonus data of 50 GB, with usage of 137 GB, with regular payment, owns the modem and without unlimited service. Using the CT_PruneTree worksheet, does the best-pruned tree classify this observation as undecided?

In the red line connecting origin to the blue line draws the number of cases with the variables. There is a great area between the lift curve and the baseline which indicates a better model meaning that the observations are undecided.

1. For the default cut-off value of 0.5, what are the overall error rate, class 1 error rate, and class 0 error rate of the best-pruned tree on the test set?

The ROC curves plots true positive rates (TPR) against false positive rates (FPR) with the cut off values moving from 0 to1. With the cut off of 0.4, the Areas under the curve (AUC) is very close to 1 indicating that the model is good fit. Class 1 and class 0 errors rates are therefore very minimal.

(5+10+5=20 marks)

1. (Answer this question or Question 8, if must not answer both, if you answer this question I don’t mark question 8 for you).

The university of Cincinnati Center for business Analytics is an outreach center that collaborates with industry partners on applied research and continuing education in business analytics. One of the programs offered by the center is a quarterly Business Intelligence Symposium. Each symposium features three speakers on the real-world use of analytics.

Each corporates member of the center (there are currently 10) receives five free seats to each symposium. Nonmembers wishing to attend must pay \$75 per person. each attendee receives breakfast, lunch, and free parking. The following are the costs incurred for putting on this event:

 Rental cost for the auditorium Registration processing \$8.50 per person Speaker costs [email protected]\$800 = \$2400 Continental breakfast \$4.00 per person \$7.00 per person \$5.00 per person
1. Build a spreadsheet model that calculates a profit or loss based on the number of nonmember registrants.

 Expenses Price per unit Rental for auditorium Regitstration process Speaker cost \$2,400.00 Continental breakfast Total expenses \$3,775.00 Attendance fee \$3,000.00 Total Income \$3,000.00 -\$775.00

Assuming that all the seats were occupied, the company will have a total of 50 attendants. 10 of the attendants will be corporate members who will not pay attendant fee. The remaining 40 attendants will pay \$75 which is the only income the company has. After subtracting expenses from the income, the company will make a loss of \$755

1. Use Goal Seek to find the number of nonmember registrants that will make the event break even.

 Expenses Price per unit Rental for auditorium Regitstration process Speaker cost \$2,400.00 Continental breakfast Total expenses \$3,775.00 Attendance fee 100.6667 \$7,550.00 Total Income \$7,550.00 \$3,775.00

For the company to experience break even (no profit or loss) the total number of attendants must be 101.

(7+8=15 marks)

1. (Answer this question or Question 7, if must not answer both, if you have answered question 7 I don’t mark this question).

Great Southern is a 117-room hotel located near the Convention and Exhibition Centre in Melbourne. The Meetings & Events Australia Ltd has planned its annual exhibition in Melbourne for the last weekend in April. Great Southern has agreed to make at least one-third of its rooms available for exhibition attendees at a special exhibition rate in order to be listed as a recommended hotel for the exhibition. Although the majority of attendees at the annual exhibition typically request a Saturday night package, some of them may choose a Friday and Saturday two-night package reservation. Travellers not attending the exhibition may also request the same day reservations as well as Friday night only one. Thus, 5 kinds of reservations are likely and have the following costs and expected demands:

 Ordinary (O) Exhibition (E) Friday night only (F) Saturday night only (S) Friday & Saturday Two-night package(FS)
1. How many rooms Great Southern can determine to make available for each type of reservation in order to maximise total profit?

2. Suppose that one week before the exhibition, the number of Exhibition travellers/ Friday & Saturday two-night package rooms that were made available sell out. If an exhibition attendee calls and requests a Friday & Saturday two-night package room, should Great Southern accept this booking? If an ordinary traveller demands this option simultaneously, which one would be better to select?

(7+8=15 marks)

TOTAL MARKS= 100