Problem 1: Decision analysis using decision tables
George Keneally is a keen investor and likes to invest in the stock market. The return he expects from investing in the stock market will depend on the state of the market. He estimates that if the market is good he will get a 12% return; if the market is fair he will get a 5% return, and if the market is poor he will get a -2% return (i.e., a loss). Over the last few months George has been feeling more cautious, and is now considering whether he should instead invest his money in government bonds, which offer a fixed return of 6% per annum. George has $100,000 to invest, and wishes to invest it either all in stocks, or all in bonds.
Decision tables are often an appropriate modelling technique to use when trying to find the best solution from a small number of alternatives. Develop a decision table for this problem and use it to answer the following questions:
1. What is the maximax decision; i.e., the decision George would make if he were optimistic? Make sure you explain why you gave your answer?
2. What is the maximin decision; i.e., the decision he would make if he were pessimistic? Again, justify your answer?
3. What decision would George make if he believed that each of the three states of the market were equally likely? You must show all calculations, and justify your answer based on these calculations.
In fact, the probability of the three states of the market are not equal. Rather, the probability that the market will be good is 50%, the probability that it will be fair is 30% and the probability that it will be poor is 20%.
4. What decision would George make once he has been given this information? (Show all your calculations, and justify your answer).
A friend of George has referred him to a consultant who is able to predict with certainty whether the market will be good, fair, or poor. The consultant would charge $2,000 for this information.
5. Should George pay the consultant? What is the most that George should be willing to pay for the consultant's advice? (Show all calculations and explain clearly how you arrived at your answer).
Problem 2: Portfolio planning using optimization
Gerry has just obtained a job in portfolio planning at a newly created investment company. His manager has given him the responsibility of investing $10 million, and he must maximise the expected return of the investment over the next year. He has four investment alternatives available to him. The expected return for each of these alternatives is given in the following table.
Investment Type
|
Expected return (%)
|
Cash
|
3
|
Listed Property
|
5
|
Australian Bonds
|
7
|
Stocks
|
12
|
There are some additional constraints on how the funds can be invested:
- a minimum of 25% of the funds is to be placed in cash;
- the amount in stocks cannot be more than double the amount in bonds;
- a maximum of 35% of the funds may be placed in stocks;
- the combined amount in bonds and stocks cannot exceed the combined amount in cash and property;
- all of the available $10 million must be invested; and
- each investment must be in multiples of $10,000.
Set this problem up as a linear programming model in Excel, and use your model to answer the following questions:
- How should the $10 million should be invested?
- What is the overall return (in dollars terms)?
- What is the overall return as a percentage of the $10 million invested?
How do your answers to the above questions change if the return from stocks is now expected to be only 5%?
Problem 3: Simulating a sales plan
Joe is the manager of an electronics store that sells TVs, HiFis, computers, and various other electronic devices. For next month Joe is planning a promotion on a discontinued model of a popular tablet computer, which has been a good seller over the last few months. He plans to run the promotion for 10 days. Joe is able to purchase the tablets from the manufacturer for $350, and he will sell them to his customers for $600. Any tablets that have not been sold at the end of the promotion will be sold to another retailer for $250.
Joe can only place one order with the manufacturer, and he must do this before the promotion begins. He doesn't know exactly what the demand will be, and estimates that on any particular day the probability of selling no tablets will be 10%; the probability of selling one tablet will be 15%; the probability of selling two tablets will be 25%; the probability of selling three tablets will be 30%; the probability of selling four tablets will be 15%; and the probability of selling five tablets will be 5%. He believes that there is a zero probability of selling any more than 5 tablets on any one day.
Obviously Joe would like to maximise his profit over the period of the promotion, and in order to do this he must order an appropriate number of tablets from the manufacturer. If he orders too few, he may not have a sufficient number to meet customer demand; if he orders too many, then his stock may exceed customer demand, and he will be forced to pass the tablets on to the other retailer.
Create a simulation model in EXCEL to assist Joe in determining how many tablets he should order.
Use your simulation model to calculate the average net profit Joe would make for various order quantities, and present your findings in a graph. (You do not need to try each possible order quantity; rather, consider incrementing order quantities in lots of, say, 5. But do simulate over a large range of order quantities; say, from 10 to 50). You should make sure that you perform enough trials to obtain a reliable estimate of the mean, but also a reasonable estimate of the spread in profits that result from some order quantity (i.e., for each order quantity calculate the standard deviation as well as the mean).
Based on your results, what advice would you give Joe? Make sure that you comment not only the mean profit, but also the variability that arises from different order quantities.
Problem 4: Predicting Hospital Expenses using regression
Hospitals are very expensive organisations to run, and the cost depends on many variables, two of which are the number of beds in the hospital, and the number of admissions. The table below shows data for 14 hospitals.
Beds
|
Admissions
|
Total Expense (Millions)
|
504
|
24000
|
191
|
203
|
6450
|
36
|
458
|
14700
|
95
|
63
|
4350
|
23
|
315
|
23250
|
140
|
210
|
7950
|
68
|
323
|
11550
|
86
|
75
|
2700
|
18
|
53
|
1350
|
21
|
135
|
900
|
9
|
165
|
4200
|
32
|
98
|
2400
|
17
|
780
|
34500
|
236
|
615
|
23850
|
149
|
Use WEKA to create three regression models for predicting total expense.1
- Model 1 should use only the number of beds as input
- Model 2 should use only the number of admissions as input
- Model 3 should predict total expense on the basis of both the number of beds and the number of admissions.
For each model, record the regression equation, the training error, and the leave-one-out cross- validation error.
Use the regression equation from each model to predict the total expense of running a hospital with 350 beds and 20,000 admissions.
Which model do you believe provides the most reliable prediction? You MUST justify your answer based on relevant data from the results that you have provided.
Problem 5: Applying MLPs to the prediction of house prices
The Housing dataset is a well-known dataset that is widely used for comparing the performance of data-mining and machine learning techniques on regression tasks. The dataset can be obtained from the UCI machine learning repository. The following URL will take you to UCI web page for this file:
https://archive.ics.uci.edu/ml/datasets/Housing
Read the documentation for this dataset, and then go to the Data Folder and download the file ‘housing.data'. Alternatively, you can download a .csv version of the data from the CSE5DSS LMS Page.
Your task to carry out experiments to compare the performance of linear regression and multilayer perceptrons on predicting the value of homes. You should use the cross-validation test option, keeping the number of folds constant over each trial. (It is up to you to chose a suitable number of folds; e.g., 10).
Perform the following:
(i) Apply linear regression to this problem, using the default settings in WEKA. Record the root mean squared error.
(ii) Now use an MLP with one hidden layer, containing what you believe to be a suitable number of hidden units in that hidden layer (various rules of thumb were described in the lectures). Vary the training time from 100 to 2000 in increments of 100, recording the mean squared error in each case. Plot a graph showing how the mean squared error varies with training time.
(iii) Now try varying the number of units in the hidden layer of the multilayer perceptron, fixing the training time to that which resulted in the best performance in (ii) above. Use at least five different values for the number of hidden units. (Choose values over a significant range). Plot a graph showing how mean squared error varies with the number of hidden units.
(iv) Based on your results from (ii) and (iii) above, try to find an MLP configuration (training time and number of hidden units) which you believe is close to optimal for this problem.
Problem 6: Classifying credit risk
The German Credit dataset is a well-known dataset that is widely used for comparing the performance of data-mining and machine learning techniques on classification tasks. The dataset can be obtained from the UCI machine learning repository. The following URL will take you to UCI web page for this file:
https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
Read the documentation for this dataset, and then go to the Data Folder and download the file ‘german.data'. Alternatively, you can download a .csv version of the data from the CSE5DSS LMS Page.
Answer the following preliminary questions:
i. How many features or attributes does the data contain?
ii. How many of the attributes are numeric?
iii. How many of the attributes are categorical (including binary)?
iv. How many examples does the data contain?
v. Which attribute represents the class variable?
vi. How many possible values can the class variable take?
vii. What does each of the values of the class variable represent (i.e., good credit or bad credit)?
Now load the file into WEKA and compare the performance of each of the following classifiers using 10-fold cross-validation:
- J48 (this is the WEKA version of Quinlan's C4.5)
- Logistic Regression
- Naïve Bayes
- MLP
(Use the default WEKA settings for each classifier.)
Present the confusion matrix showing the results for each of the four classifiers, and for each case, calculate the accuracy, precision, and recall. (IMPORTANT: When presenting your confusion matrices, make sure that it is clear what is being represented in rows and columns; i.e., ‘actual' classes, or ‘predicted' classes).
As described in the documentation for the dataset, the cost of misclassifications are not equal, and it is worse (in fact 5 times worse) to classify a customer as good when they are bad, than it is to classify a customer as bad when they are good. Using the results that you have provided above, calculate
the weighted misclassification error for each of the classifiers, and, on the basis of these calculations, recommend which of the classifiers is the best to use on this dataset. Make sure that you show all calculations, and provide a clear justification for your answer.
Attachment:- Decision Support Systems.rar