Assignment:
Data Import - A
Learning Objectives
In this assignment, you will learn how to:
? read and write Excel files
? shape data
Tasks
Before diving into the programming problems, study the data file "2013 Geographic Coordinate Spreadsheet for US Farmers Markets 8'3'1013" that is provided for the assignment:
1. Load the data file into a data frame.
2. The seasons are not standardized and would make analysis difficult. Create six levels of seasons: Summer, Fall, Winter, Spring, YearRound, HalfYear and convert each provided season in the Season1Date column to one of the season levels. Come up with reasonable rules, for example, June to August would be Summer, while 5/1 to 10/30 would be HalfYear. You need to use string processing functions to parse the strings and shape the data into the categories. If there are missing end dates, ignore the entire row.
3. Write a retrieval function acceptsWIC() that allows a data scientist to find which markets accept WIC. The function should output a data frame containing only the markets that accept WIC. Use the function and output the data frame.
Data Import - B
Learning Objectives: In this assignment, you will learn how to:
? read and parse XML
? retrieve data from XML
Tasks
1. Load the XML document at the URL
(https://www.xmldatasets.net/temp/179681356453762.xml) directly into a data frame.
2. Write a function senatorName() that returns the names of the senators for a given state, i.e., the function takes a state as an argument and returns the names of the senators for that state in a vector.
3. Write a function senatorPhone() that returns the phone number for a given senator. The function should take the first and last name of the senator as an argument, parse the name, search for a match in the data frame and return the phone number for that senator.
Be sure to call your functions to test that it works. Use local variables only and pass any information your function needs as parameters.
Data Import - C
Learning Objectives: In this assignment, you will learn how to:
? read and parse XML
? retrieve data from XML
Tasks
1. Load and then parse the XML document at the URL
(https://www.cs.washington.edu/research/xmldatasets/data/auctions/ebay.xml) using xmlTreeParse() . The data sets contains bidding information about items on eBay. Create any intermediate data objects as deemed necessary to write a function named moreFiveBids() to answer the following question: how many auctions had more than 5 bids? Use the function to output the answer.
2. Take a look at the data set on trades during a single day for ESZ13 futures trades at the URL https://www.barchartmarketdata.com/datasamples/getHistory15.xml. After loading the data, write and use the following functions to answer these retrieval queries:
a. highestClosingPrice() answers the question: what was the highest closing price for the security?
b. totalVolume() answers the question: what was the total volume traded?
c. averageVolume() answers the question: what was the average trading volume during each HOUR of the trading day? The function should place the result into a data frame containing the hour and average trading volume for that hour.
Data Import - D
Learning Objectives: In this assignment, you will learn how to:
? read and parse text files
Tasks
1. Load IMDB movie listing from the file movies.list.gz. Note that the file is compressed so you need to figure out how to uncompress it in R. Inspect the file and determine how to best load it this is not an XML file and requires custom string parsing.
2. Parse the data. You should identify all the fields and their meanings within the file. Place the data into a data frame suitable for further analysis.
3. Comment your code where you identify the movie rows that are part of your result set.
4. Your result set should only contain movie title and movie release year. Your result set should NOT include rows for TV shows. You can identify the movies within the data file (look for a special marking field or some other indication). Make any other assumptions you need, but comment your assumptions. For a cleaner result set, look for duplicate titles and remove the duplicates.
5. For correct syntax, coding style and readable code format. If the data size is overwhelming, you can build a smaller subset of the file that's easier for testing, loads faster and is representative of the file (a random sample of rows). This is a common technique when building data loaders. This technique is for debugging purposes only.
Your submitted assignment should be run against the complete data set.
Assignment Link -
https://www.dropbox.com/s/pewqshz7oylv26w/Assignment.rar?dl=0