Assignment task:
You are performing text mining on a customer review dataset containing 200 customer reviews. Answer the following questions:
Suppose each review was limited to no more than fifty words. In the term-document matrix, which dimension is more likely to be larger, the number of documents or the number of terms? Explain your choice in one sentence.
You are considering to use stemming or lemmatization for processing the review text. The term 'increasing' appeared in many reviews. What are the results of stemming and lemmatization of this term, respectively?
In addition to the review text data, each customer also provided a rating score, with 1-star representing poor and 5-star representing excellent. Suppose your text mining task is to predict ratings based on the customer reviews. Which of the three techniques below is NOT appropriate for your task? Choose only one answer.
(i) J48 decision tree algorithm
(ii) support vector regression
(iii) k-means algorithm