The Big Data Assignment is comprised of two parts:
- The first part is to create the algorithms in the tasks, namely: Decision Tree, Gradient Boosted Tree and Linear regression and then to apply them to the bike sharing dataset provided. Try and produce the output given in the task sections (also given in the Big-Data Assignment.docx provided on Blackboard).
- The second part is then use those algorithms created in the first part and apply them to another dataset chosen from Kaggle (other than the bike sharing dataset provided).
1. Utilising Python 3 Build the following regression models:
- Decision Tree
- Gradient Boosted Tree
- Linear regression
2. Select a dataset (other than the example dataset given in section 3) and apply the Decision Tree and Linear regression models created above. Choose a dataset
3. Build the following in relation to the gradient boost tree and the dataset choosen in step 2
a) Gradient boost tree iterations (see Big-Data Assignment.docx section 6.1)
b) Gradient boost tree Max Bins (see Big-Data Assignment.docx section 7.2)
4. Build the following in relation to the decision tree and the dataset choosen in step 2
a) Decision Tree Categorical features
b) Decision Tree Log (see Big-Data Assignment.docxsection 5.4)
c) Decision Tree Max Bins (see Big-Data Assignment.docx section 7.2)
d) Decision Tree Max Depth (see Big-Data Assignment.docx section 7.1)
5. Build the following in relation to the linear regression and the dataset choosen in step 2
a) Linear regression Cross Validation
i. Intercept (see Big-Data Assignment.docx section 6.5)
ii. Iterations (see Big-Data Assignment.docx section 6.1)
iii. Step size (see Big-Data Assignment.docxsection 6.2)
iv. L1 Regularization (see Big-Data Assignment.docx section 6.4)
v. L2 Regularization (see Big-Data Assignment.docx section 6.3)
b) Linear regression Log (see Big-Data Assignment.docx section 5.4)
6. Follow the provided example of the Bike sharing data set and the guide lines in the sections that follow this section to develop the requirements given in steps 1,3,4 and 5
3.1 Task 1
Task 1 is comprised of developing:
1. Decision Tree
a) Decision Tree Categorical features
b) Decision Tree Log (see Big-Data Assignment.docx section 5.4)
c) Decision Tree Max Bins (see Big-Data Assignment.docx section 7.2)
d) Decision Tree Max Depth (see Big-Data Assignment.docx section 7.1)
3.2 Task 2
Task 2 is compromised of developing:
1. Gradient boost tree
a) Gradient boost tree iterations (see Big-Data Assignment.docx section 6.1)
b) Gradient boost tree Max Bins (see Big-Data Assignment.docxsection 7.2)
c) Gradient boost tree Max Depth (see Big-Data Assignment.docx section 7.1)
3.3 Task 3
Task 3 is compromised of developing:
1. Linear regression model
a) Linear regression Cross Validation
i. Intercept (see Big-Data Assignment.docx section 6.5)
ii. Iterations (see Big-Data Assignment.docx section 6.1)
iii. Step size (see Big-Data Assignment.docx section 6.2)
iv. L1 Regularization (see Big-Data Assignment.docx section 6.4)
v. L2 Regularization (see Big-Data Assignment.docx section 6.3)
b) Linear regression Log (see Big-Data Assignment.docx section 5.4)
Attachment:- Marking Creiteria.rar