Problem
The objective of this project is to perform word frequency analysis.
Provides Twitter data of Elon Musk from 2010-2022. For analysis consider the years 2018-2022 (last 5 years). Each year has thousands of tweets. Assume each year to be a document (all the tweets in one year will be considered as a document)
I. Compute the term frequencies for each year. They should be normalized (scale of [0, 1]). Exclude stopwords.
II. Show the top ten words (for each year) by highest value of word frequency.
III. Plot a histogram of word frequencies for each year
IV. Demonstrate Zipf's law by plotting log-log plots of word frequencies v. rank for each year
V. Use TF-IDF to calculate and show the 5 most "important" words for each year