You are expected to submit professionally presented word-processed assessment documents.
This includes:
- A title page showing: ID number/s, name/s, lecturers' name/s, and assessment title.
- Correct spelling and appropriate use of grammar.
- Pages numbered including a contents page.
- Stapled or bound (no paper clips/plastic folders or plastic sleeves).
- Questions correctly labelled and numbered with clear and consistent headings
- For main text use Time New Roman / Arial / Calibri front type and 12pt font size.
- Line spacing no less than 1.5 and no greater than double.
- A complete reference list should be included at the back of the assessment using Harvard AGPS style. of referencing with in-text citation.
Learning Objectives : Applicable course objective:
- LO-1: Demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence and understand the necessity of data driven decision-making.
- LO-2: Understand the resulting organisational change for business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and how these apply to ......business processes.
- LO-3: Identify and solve complex organisational problems creatively and practically through ......problems.
- LO-4: Comprehend the changing organisational culture and address complex ethical dilemmas that arise from evidence based decision making and business performance management.
- LO-5: Demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management.
Task 1: Exploratory Data Analysis, model building and cross validation through RapidMiner
The objective of Task 1 is to predict the probability of rainfall for tomorrow (next day) based on today's weather conditions. In Task 1, you are required to use the data mining tool RapidMiner to analyse and report on the weather2008-17.csv data set provided for Assignment 2. You should review the data dictionary for weather2008-17.csv data set (see Table 1.1 below).
The weather dataset contains 138,307 daily observations from January 2008 through to January 2017 from 49 weather stations. All observations were drawn from these 49 weather stations. In completing Task-1 of Assignment-2 you will need to apply the business understanding, data understanding, data preparation, modelling and evaluation phases of the CRISP DM data mining process.
Task 1.1 Conduct an exploratory data analysis of the weather2008-17.csv data set using RapidMiner to understand the characteristics of each variable and the relationship of each variable to the other variables in the data set. Summarise the findings of your exploratory data analysis in terms of describing key characteristics of each of the variables in the weather2008-17.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values, invalid or inconsistent values and others (if it is appropriate for this data analysis) and relationships with other variables in a table named ‘Results of Exploratory Data Analysis for weather2008-17 Data Set'.
Briefly discuss the key results of your exploratory data analysis and the justification for selecting your five (5) top variables for predicting whether it is likely to rain tomorrow based on today's weather conditions. (About 500 words)
Task 1.2 Build a Decision Tree model for predicting whether it is likely to rain tomorrow based on today's weather conditions using RapidMiner and an appropriate set of data mining operators and a reduced weather2008-17.csv data set determined by your exploratory data analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Decision Tree Model process, (2) Final Decision Tree diagram, and (3) associated decision tree rules.
Briefly explain your final Decision Tree Model Process, and discuss the results of the Final Decision Tree Model drawing on the key outputs (Decision Tree Diagram, Decision Tree Rules) for predicting whether it is likely to rain tomorrow based on today's weather conditions and relevant supporting literature on the interpretation of decision trees (About 250 words).
Task 1.3 Create a Weka Logistic Regression model for predicting whether it is likely to rain tomorrow based on today's weather conditions using RapidMiner and an appropriate set of data mining operators and a reduced weather2008-17.csv data set determined by your exploratory data analysis in Task 1.1. Provide these outputs from RapidMiner (1) Final Logistic Regression Model process and (2) Coefficients, and (3) Odds Ratios.
Briefly explain your final Logistic Regression Model Process, and discuss the results of the Final Logistic Regression Model drawing on the key outputs (Coefficients, Odds Ratios) for predicting whether it is likely to rain tomorrow based on today's weather conditions and relevant supporting literature on the interpretation of logistic regression models (About 250 words). (2+3+2=7 marks)
Task 1.4 You will need to validate your Final Decision Tree Model and Final Logistic Regression Model. Note you will need to use the X-Validation Operator; Apply Model Operator and Performance Operator in your data mining process models here.
Discuss and compare the accuracy of your Final Decision Tree Model with the Final Logistic Regression Model for whether it is likely to rain tomorrow based on today's weather conditions based the results of the confusion matrix, ROC, Lift chart for each final model. You should use a table here to compare the key results of the confusion matrix for the Final Decision Tree Model and Final Logistic Regression Model (About 250 words).
Task 2 Data Warehousing, Big Data, and Contemporary Issues
- LO1, LO2 (15 marks) & LO4
For Task-2.1 to 2.3 research the relevant literatures on how big data analytics capability can be incorporated into organizational data warehouse architecture and answer the below requirements-
Task 2.1 Develop an advanced high level data warehouse architecture design for a large state owned water utility company that incorporates both organizational structured data as well as big data capture, processing, storage and presentation in a same diagram called - ‘Big Data Analytics and Data Warehouse Combined' (about 50 words).
Task 2.2 Describe and justify the main components of your proposed high level data warehouse architecture design with big data capability incorporated presented in Task 2.1 with appropriate literature support (about 750 words).
Task 2.3 Identify and critically analyse the key security, privacy and ethical concerns for organisations within a specific industry that are already using a big data analytics and algorithmic approach to decision making with appropriate in-text referencing support (about 700 words).
Task 3 Tableau Desktop Dashboard
Assume you are the tableau specialist of a New Zealand based Data Analytics Company which is helping their client US Aviation LLC (an American aircraft manufacturer,) to better understand the wildlife strikes with aircraft, its causes and overall impacts. The aviation- wildlife.csv lists historical data recorded for American aviation industry regarding wildlife strikes with aircraft for the years 2000 to 2011. See
Table 3.1 which provides the Data dictionary for aviation-wildlife.csv dataset.
Variable Name
|
Data Type
|
Description
|
1. Aircraft: Type
|
Categorical
|
Aircraft, Helicopter
|
2. Airport: Name
|
Categorical
|
Name of Airport
|
3. Altitude-Bin
|
Categorical
|
< 1000 Metres, > 1000 Metres, Unknown
|
4. Aircraft: Make/Model
|
Categorical
|
Make and Model of Aircraft
|
5. Wildlife: Number struck
|
Categorical
|
Range of numbers
|
6. Effect: Impact to flight
|
Categorical
|
None, Aborted Take-off, Engine Shut
Down, Precautionary Landing, Other
|
7. Effect: Other
|
Categorical
|
Text remarks recorded for flight
|
8. Location: Nearby if en route
|
Categorical
|
State Abbreviation
|
9. Aircraft: Flight Number
|
Real
|
|
10. Flight Date
|
Date
|
Date of Flight
|
11. Record ID
|
Integer
|
Record ID - unique integer number
|
12. Effect: Indicated Damage
|
Categorical
|
No Damage, Caused Damage
|
13. Location: Freeform en
route
|
Categorical
|
Text remark recorded for flight
|
14. Aircraft: Number of
engines?
|
Integer
|
1, 2, 3 or 4
|
15. Aircraft: Airline/Operator
|
Categorical
|
Airline Operator
|
16. Origin State
|
Categorical
|
Flight Origin State
|
17. When: Phase of flight
|
Categorical
|
Take-off run, Approach, Climb, En-route,
Landing Roll
|
18. Conditions: Precipitation
|
Categorical
|
Fog, None, Rain, Snow
|
19. Remains of wildlife
collected?
|
Categorical
|
False, True
|
20. Remains of wildlife sent to
Smithsonian
|
Categorical
|
False, True
|
21. Remarks
|
Categorical
|
Text remarks recorded regarding aviation
- wildlife collusion
|
22. Reported: Date
|
Date
|
Date Aircraft collusion with wildlife
reported
|
23. Wildlife: Size
|
Categorical
|
Small, Medium, Large
|
24. Conditions: Sky
|
Categorical
|
No Cloud, Overcast, Some Cloud
|
25. Wildlife: Species
|
Categorical
|
Different types of wildlife mainly birds
|
26. When: Time (HHMM)
|
Categorical
|
24 hour format
|
27. When: Time of day
|
Categorical
|
Dawn, Day, Night, Dusk
|
It is important for you understand the variables in this dataset in order to build the required Aircraft Wildlife Strikes (AWS) dashboard with four specified Tableau views.
Task 3 requires you build a Tableau dashboard which includes four different views of the aviation-wildlife.csv data set for the years 2000-2011 as specified in sub Tasks 3.1, 3.2, 3.3 and 3.4.
Task 3.1 Create a Tableau View of the impact of wildlife strikes with aircraft over time for a specific origin state. Provide a screen capture of and describe the Tableau view you have created and comment on the different types of impact to aircraft from wildlife strikes over time and does this differs much for different origin states (about 125 words).
Task 3.2 Create a Tableau View of flight phase by time of the day which shows when wildlife strikes with aircrafts occur. Provide a screen capture of and describe the Tableau view you have created and comment on which phase of a flight and time of the day wildlife strikes with aircraft are more likely to occur (about 125 words).
Task 3.3 Create a Tableau View that compares wildlife species in order of aircraft strike frequency and the chance of damage occurring. Provide a screen capture of and comment on which wildlife species are most frequently involved in aircraft strikes and which wildlife species are most likely to have the most impact in terms of damage (total cost) when an aircraft strike occurs (about 125 words).
Task 3.4 Create a Tableau Geo-Map View of flights by origin states that displays the number of wildlife strikes and total monetary cost for each origin state for different periods of time. Provide a screen capture of and describe the Tableau view you have created and comment on this Tableau GeoMap View in relation to the number of wildlife strikes by origin state and total monetary cost over time. A number of origin states cannot be plotted on the geomap view as these are outside USA, comment on how you can deal with this issue (About 125 words).
Task 3.5 Provide screen snapshot of your AWS Dashboard and an accompanying rationale (drawing on the relevant literature for good dashboard design) for the graphic design and functionality that is provided by your AWS Dashboard for the four specified Tableau views for sub Tasks 3.1, 3.2, 3.3 and 3.4 (About 500 words).
You will need to submit your Tableau workbook in .twbx format which contains your dashboard as a separate document to your main report for Assignment 2.
Report presentation, writing Style and quality references
- LO5
Presentation: use of formatting, spacing, paragraphs, table of contents, list of tables and diagrams, introduction, conclusion, Appendix
Writing style: Use of English (Correct use of language, grammar, spelling and proofreading)
Referencing: Appropriate level of referencing in text where required, reference list provided, used Harvard Referencing Style correctly
Your assignment 2 report must be structured in report format as follows:
- UUNZ Cover page for assignment 2 report
- Title Page
- Executive Summary
- Table of Contents (Including List of Tables and Figures)
1. Body of report- main sections and subsections for assignment 1 task and sub tasks
1.1 Task 1.1 will be an appropriate sub headings etc....then for each sub task 1.2 and
1.3 and so on......
Writing Style and Online Assignment submission
This assignment must be the expression of your own work. Use of English correctly; such as, correct use of language and grammar, spelling-checking and proof-reading.
All assignments must be submitted electronically via the course study desk "Turnitin Check Link: Assignment-2" first. Then, Turnitin (plagiarism software) performs an automated checking for plagiarism, collusion and cheating. After that, you need to submit Turnitin generated originality report (.pdf) with Tableau file (.twbx) in the uPortal "Assignment-2 submission link".
Note carefully UUNZ policy on Academic Misconduct such as plagiarism, collusion and cheating. If any of these occur they will be found and dealt with by the UUNZ Academic Integrity Procedures.
Harvard AGPS Referencing Requirement:
The Harvard AGPS referencing style and in-text citations must be used in appropriate places. Study the referencing techniques for Harvard AGPS Referencing. UUNZ TPS (Tertiary Programme Support) classes will help you to present your assignment in the correct report writing format and Harvard AGPS style of referencing.
Attachment:- Specifications.rar