Specifically you must provide the answers and code to the


You must use Hadoop technologies to analyze the Yelp 2016 challenge dataset:

https://www.yelp.com/dataset_challenge.

You can use Hadoop, R/ Apache Spark.

Specifically, you must provide the answers (and code) to the following questions:

1. Summarize the number of reviews by US city, by business category.

2. Rank the cities by # of stars for each category, by city.

3. What is the average rank (stars) for businesses within 800 ft of Times Square, by type? For this problem, assume Times Square is at lat: 40° 45' 32.0256'' N, lon: 73° 59' 6.4680'' W, and 800 ft. to be a square 10 seconds in each direction.

4. Rank reviewers by number of reviews. For the top 10 reviewers, show their average number of stars, by category.

5. For the top 10 and bottom 10 food business in Times Square (in terms of stars), summarize rating by hour of day.

Changes: Since the Yelp academic dataset does not include NY, we need to amend the coordinates:

Center: Carnegie Mellon University, Pitsburgh, PA

Latitude: 40-26'28'' N,  Longitude: 079-56'34'' W

Decimal Degrees: Latitude: 40.4411801, Longitude: -79.9428294

The bounding box for the midterm is ~5 miles, which we will loosely define as 5 minutes. So the bounding box is a square box, 10 minutes each side (of longitude and latitude), with CMU at the center.

  • provide suitable statistical analysis of your results with R.
  • provide visualizations for results (distributions, graphs, maps, in R).

Request for Solution File

Ask an Expert for Answer!!
Dissertation: Specifically you must provide the answers and code to the
Reference No:- TGS01579206

Expected delivery within 24 Hours