Assignment
Scenario
You have been hired as a consultant by a Victorian-based car rental firm who want you to design and develop a data warehouse to integrate information which is currently held in several separate databases and spreadsheets. They have provided you with several samples of data which can be used as basis for designing and implementing a suitable conformed dimensional design and ETL process.
The following data files are available via Moodle:
- Details of the firm's three franchises
- The firm's current inventory of cars
- The records of the firm's customer loyalty program
- For each franchise, the records of all transactions over roughly a 2-month period. Three different types of transactions have been recorded - the initial booking, the pick-up of a car, and the return of the car. Different attributes have been kept for each type of transaction. The datasets are reasonably clean, but some issues may remain.
Requirements
There are four separate components required for your submission.
- Dimensional Design
Analyse the structure of the existing data, and design suitable dimension and fact tables to store this data in a conformed dimensional structure. Document your design using any suitable drawing tool such as Visio draw.io then export it as a PDF or PNG file for submission and marking.
- Database Schema
Implement the design from the previous stage as a database schema defined by a SQL file which can be imported into XAMPP.
- ETL Process
Use the Pentaho Community Edition software used in Lab 4 to implement an ETL system which can load the data from the various files provided by the firm, perform appropriate transformations, and load it into your database. Your solution should be robust to any issues arising in the source data, such as inconsistencies, missing values, or duplicate entries.
- Report
Create and submit a brief report documenting your ETL system. In particular explain any decisions which you had to make regarding treatment of data issues. If you have been unable to complete any aspects of the ETL system, or if some parts do not function correctly, document these limitations.
Learning Outcomes Assessed
The following course learning outcomes are assessed by completing this assessment:
K2. Describe data management, storage, processing needs for big data analytics.
S2. Use data loading and formatting tools.
A1. Perform appropriate data collection and processing techniques.
A2. Demonstrate the use of tools to collect, store, process and use data for big data analytics.