Problem Assignment -
The Enron scandal led to the bankruptcy of the Enron Corporation, the largest bankcruptcy reorganization in US history at that time, and to the dissolution of Arthur Andersen, one of the five largest audit and accountancy partnerships in the world. In this exercise, you will download text on the scandal available through Wikipedia and filter it to the sentences dealing with Kenneth Lay, one of the main figures in the scandal.
Download the source code from following wikipedia page: Enron scandal. Use readLines ( . . . ).
Go to the same webpage in your browser and look at the source code (Google Chrome: right mouse click & view page source). All lines that include text from the main body (no headers, info boxes, etc.) always start with the same html tag, namely
. Use a regular expression to limit the downloaded data to lines that include text from the main body. Use grep (. . .).
Remove html tags using gsub( . . . ). Html tags always have the same format, namely a certain number of characters within angle brackets (also called guillemets, '<' and '>'), e.g.
. Write a regular expressions that captures all html tags.
We want to construct a vector where each element is a single sentence, which is currently not the case First, collapse the current vector into one character string, using paste( . . . , collapse = " " ) Subsequently, seperate the vector again at the end of individual sentences. We assume that '.' is the only sentence seperator. However, '.' is also a special character for constructing regular expressions. In order to use '.' as full stop, and not as the meta character 'any character', use backslashes as shown below. In addition, use the suffix [[1]] as the output is a list.
strsplit(..., "\\.")[[1]]
Find all sentences that include the term kenneth lay, ignoring cases.
Save the resulting vector of sentences in a text file named enron_ scandal . txt. Make sure that the resulting file does not have column names, row names, or quotation marks around the individual entries.
Request for Solution File
1452189
Questions
Answered
Start Excelling in your courses, Ask a tutor for help and get answers for your problems !!
ask Question