Tasks
Most of your statistical calculations should be carried out using Excel only and you will use Microsoft Word and Excel to complete this assignment.
1. Select a Random Sample
Select a random sample of size 50 from the given 1000 cases. You will use this sample data to complete tasks 2 to 6. If repeat cases occur in your random sample, include each case only once, so you may end up with a sample size less than 50. In such situations do not draw further cases to acquire 50 cases, continue to work on your sample only, even if it is smaller than 50.
Save your sample in a separate file called SurnameI_ SAMPLE.xlsx , where Surname is your surname, I is the first initial of your first name. For example SmithJ_ SAMPLE.xlsx
Explain how you obtained your sample in the appendix and provide a list of your customer data.
2. Descriptive Statistics
Use appropriate data summary methods to describe the variables in the your data sample. For each variable except the ID variable use one appropriate Tabulation Technique, one appropriate Graphical Technique and appropriate summary statistics, chosen according to the type of variable. These techniques will be chosen rom:
Tabulation Techniques: Frequency tables or Grouped frequency tables.
Graphical Techniques: Pie chart, Bar graph, Histogram, Frequency Polygon.
Summary Statistics: Mode, Median, Mean, Standard Deviation, Range,
Coefficient of Variation and Interquartile Range.
NB: You will need to choose the most appropriate technique(s) for each variable being analysed. Less appropriate/inappropriate techniques will receive fewer/no marks. Do not present an Ogive curve, a Stem plot or a Box plot in this assignment. Use a range of graphs as much as possible. FIN 10002 Assignment semester 2_ 2015 page 3 of 6.
3. Confidence intervals
Use your sample data to estimate the following quantities, using 95% confidence intervals and assuming a normal distribution. Explain the meaning of your confidence intervals.
(i) The average amount of credit given.
(ii) The average duration of the account.
Compare these intervals with their respective true means calculated from the full dataset in the main section of the report.
4. Hypothesis Testing
(a) It is often felt that female account holders have more difficulty obtaining credit than male account holders. Investigate this contention by carrying out an appropriate hypothesis test using two of the variables from your data, assuming that the necessary assumptions are met. Include a suitable frequency polygon plot and a table of means, standard deviations and sample sizes in the main section of your report.
(b) It is often felt that the average age will differ for people with good and bad credit risks. Investigate this contention by carrying out an appropriate hypothesis test using two of the variables from your data, assuming that the necessary assumptions are met. Include a suitable frequency polygon plot and a table of means, standard deviations and sample sizes in the main section of your report.
Only report a non-technical explanation of your methodology and your findings in the main section of the report.
The computations and output should be placed in the appendix.
5. Correlation and Regression
In this section you will investigate the relationship between the amount of credit given and duration of the account.
Using these two variables develop a regression model to predict amount of credit given from duration of the account, assuming the necessary assumptions are met.
Make sure that you undertake a full regression analysis, with appropriate discussion and include in the main section of the report:
- a scattergram and a brief discussion
- an estimate of the linear regression model
- the coefficients of correlation and determination
- a test of the hypothesis that there is no linear relationship between transaction amount and the age of an account holder.
Attachment:- Assignment-Data.xlsx