Machine Learning with Big Data Assignment
(1) Multilayer Perceptron Classifier; Consider the News Popularity Prediction dataset from the UCI machine learning repository. The articles were published by Mashable and the dataset contains one csv file OnlineNewsPopularity.csv. The dataset contains a combination of integer-valued and real-valued features. Most of the categorical features are already in a format suitable for the multilayer perceptron (called the "one-hot encoded" format). Details of the dataset are described in the above link. Spend some time understanding the structure of the dataset, how the instances are organized, how the features are organized, what the various features mean, and so on. Do not attempt to run any machine learning algorithm before understanding the structure of the dataset.
Using this dataset, implement a Multilayer Perceptron Classifier (MLP) to predict whether an article is popular or not.
Popularity: In this assignment, we will consider an article to be popular if the number of shares for that article is greater than 1400 shares).
Therefore, you have a binary classification problem, where articles which have shares more than the mean share value belong to class 1 and others belong to class 0.
Implement different MLP architectures for the following scenarios and report the precision, recall, and fscore for 5-fold cross validation:
Layers:
a. Use 2 hidden layers with 20 nodes in each layer
b. Use 2 hidden layers with 100 nodes in each layer.
c. Use 5 hidden layers with 20 nodes in each layer.
d. Use 5 hidden with 100 nodes in each layer.
Activation Function:
a. Use 'relu'
b. Use 'tanh'
Compare the precision, recall, and fscore from these scenarios and discuss the performance of these architectures in your report
(2) In the scikit learn package, the default value for the learning rate of an MLP is 0.001. Explain what will happen to the classification result if we set this parameter to 0.5 and why.
Attachment:- Assignment File.rar