An introduction to your project
For Assignments 2 and 3 you are required to collect and analyse some chosen data to provide answers to some specific questions you have posed. The following information relates to both Assignment 2 (your project plan) and Assignment 3 (your project).
Read this section carefully before you submit your project plan.
Select your topic
Selecting a topic for your project is often the hardest part. It is best if you obtain data from within your workplace or related to your degree major, although some people like to use data relating to a hobby, community interest or sport. If you cannot think of a topic:
- look at the list of possible topics at the end of this section
- see the list of possible data sources provided on the Assignment 2 page of the course
- put a keyword into a search engine on the internet. There is a lot of data available in raw form - sports data, weather and so on. New Zealand data is preferable
- look around your home and see what sort of data is available - television guides, magazines, newspapers, books, CDs and so on.
Note that you need raw data, not data that has already been totalled and summarised. Statistics NZ website has only aggregated and not raw data. Data from their website is not suitable for the project. Time series data is also not suitable. If you are not sure about the data suitability contact your lecturer.
What are your aims?
You will need to specify in detail two or more questions that you want to answer. These should be more specific than ‘to find out about the volume of calls'. Instead it should give some indication of what data you need to collect and how it will be analysed. Some examples are:
- to quantify the relationship between advertising expenditure and retail sales for business A
- to determine if the average time to answer calls in a call centre differs between Mondays and Fridays
- to see if the style of office communication used is dependent on gender.
To ensure you have enough scope to complete a good project you may need at least two questions, but not too many, as time will be limited. Ensure the questions you pose can be answered using some of the analysis techniques taught in the course. Pose your questions before you collect your data to ensure that you collect the data needed to answer the questions, and not any unnecessary data.
Data collection
You need a sample of about 100, a random sample from a larger population. You will need to specify in detail how your data was collected and which sampling methods you have used. You can use existing records (for example, existing accounts, maintenance records, phone logs), but they should consist of individual data items, not summarised data.
Alternatively, you may choose to collect your data from scratch (but see the Ethics heading below). Include mention of the accuracy of your data and how you dealt with missing values. State clearly how your sampling was done, or any assumptions you are making.
Statistical analysis
The aim of this project is for you to demonstrate the use of the statistical techniques taught in this course. One of the more difficult aspects is to decide on the most appropriate technique to use. But this is also one of the most important aspects. If you are unsure, you should contact your lecturer and discuss it.
Normally you will start by exploring your data using graphs and descriptive statistics. These are used to get an overall picture of how your data looks and should lead into your higher-level analyses. Note that hypothesis testing relies on random sampling, so you might need to make some assumptions (for example, you might make the assumption that the data available from the last four weeks can be regarded as a random sample of some longer period).
You will need descriptive statistics and graphs plus at least two of the following types of analyses:
- t-test and confidence interval for mean (one-sample)
- t-test and confidence interval for means (two-sample)
- t-test and confidence interval for paired samples (matched pairs)
- chi-square test and confidence interval for proportions
- chi-square test of association
- regression analysis.
Ethics
It is best that you avoid projects that involve surveying people by questionnaire. This could take too long and it could also involve problems of informed consent. Also, keep in mind privacy issues. Individual names should not be used - instead use numbers to identify individuals. If you are using business data, get approval for its use first. If your business or organisation needs a letter assuring confidentiality, ask your lecturer. The name of the business is not needed, just an idea of the type of business.
Some possible project topics include:
- the volume of sales and advertising expenditure
- charge-out rates for tradespeople
- type of calls to a call centre
- internet sales patterns
- speed of passing traffic
- preferences for communication forms
- reading ability of children
- lending patterns in libraries
- speed of internet connections
- volume of recycling.
Assignment
Write a plan for the project you intend to complete for Assignment, using the template provided on the Assignment 2 page in your course. A completed template is also provided - use this for further guidance.
The suggested word count is 800 - 1000 words.
By completing Assignment 2, you should have shown that you can plan an investigation, form suitable aims, find suitable data and plan appropriate analyses.
As your lecturer assesses your responses, he will be looking for:
- a brief description of the project, with reasons for your choice
- a description of the data resource and variables
- two or more specifically stated aims (research questions)
- details of sampling and data collection
- details of proposed analysis
- a sample of the data.project plan
Section A: Topic and aims
Title
The title of my project is:
Introduction
Give a brief description of your project. Discuss why it is important to investigate your aims and why the project is of general interest. This should be covered briefly here and in more depth on the actual project.
Aims
Specify two or more aims. These should be written in clear statistical language. For example:
1. to test if the mean value of a variable differs for two populations (specify the variable and populations)
2. to estimate the proportion of ...... from a stated population
3. to investigate if there is a linear relationship between ... and... (specify the two numeric variables)
4. to investigate if there is any association between ... and... (specify the two categorical variables)
Section B: Data collection
Data and sampling
State the source of your data (give the link if this comes from a website) and how the data was collected. You need a simple random sample from a larger population or a stratified random sample (e.g. divide your population into men and women and randomly sample from each). State what your population is, your method of sampling and your sample size.
1. Treat your dataset (including recommended datasets) as your population. Then take a random sample from that population. A sample size of about 100 is suitable.
2. Data which has already been summarised, such as weekly totals of phone calls, cannot be used. You need a sample of individual phone calls.
3. Data which is in time series order (e.g. weekly sales for the last two years) is not a random sample and can only be used in conjunction with other data.
4. Possible sources for data have been provided.
Variables
Name your variables and state whether they are numeric or categorical variables. Give the range of possible values each variable can take.
Section C: Data analysis
Specify your planned analysis to answer Aim 1.
This should include the exploratory work (summary statistics, check for outliers), graphs you will use and a form of inference.
For example, if the aim is to compare exam marks for men and women you might say your analysis will include:
- Histograms of men's marks and women's marks with comments
- 5 number summary of marks for men and women with comments
- Side by side boxplots of marks with comments
- Check for outliers using 1.5IQR rule and comment on outlier values
- Mean and standard deviation of men and women's marks with comments
- QQ plots of men and women's marks with comments
- A two sample t test to compare mean marks for men and women*
- A confidence interval for the difference between the two means
Think carefully about whether a two-sample t test or a matched-pairs t test is appropriate.
Attachment:- Data sources.pdf