QUESTION 1. HIVE OR PIG PROGRAMMING
Build a table to be used as the basis for a directed weighted graph (network) that shows which stations are connected. Use the trip count between those stations as the weights. Save this table in ‘/user/lab/q1' in HDFS.
Include the directory listing of the output directory and first five lines of the output file in your submission.
QUESTION 2. HIVE OR PIG PROGRAMMING
Load the data from ‘/user/lab/q1' you created in the previous question. For each route, calculate a ‘traffic index' where
Save the results in ‘/user/lab/q2' in HDFS. Include the directory listing of the output directory and first five lines of the output file in your submission.
QUESTION 3. SPARK PROGRAMMING
Load the data from ‘/user/lab/q2' you created in the previous question.
a) Find the 3 stations with the highest in-degree. What does that mean in real life?
b) Find the 3 stations with the highest out-degree. What does that mean in real life?
Submit scripts/queries (Pig, Hive, Spark, Hadoop) and final output (screen copy or screen shot as appropriate).
Attachment:- Trip and station data.rar