In Section 11.8.3 we saw that the latitudes and longitudes for the countries are available on the Web, embedded in the page at https://dev.maxmind.com/geoip/ legacy/codes/average-latitude-and-longitude-for-countries/. A snippet of the HTML source for this page is shown in Figure 11.10. Extract these location coordinates from this HTML file.
Examine the HTML source, and notice that the data are simply placed as plain text within a node in the document. If we can extract the contents of this node, then we can place this information in a data frame. Begin by parsing
the HTML document with html Parse(). Don't download the document, simply pass the function the URL, https://dev.maxmind.com/geoip/legacy/codes/ average-latitude-and-longitude-for-countries.
Next, access the root of the document using xml Root (), and use an X Path expression to locate the nodes in the document.
The get Node Set() function should be useful here. The get Node Set() function takes the XML tree (or sub tree) and an X Path expression as input and returns a list of all nodes in the tree that are located with this expression.
Once you have located these nodes, check to see how many were found. It should be only one. The latitude and longitude values are in the text content of this node. Extract the text content with the xml Value () function. It should look something like:
The content is one long character string containing all the data.
Complete the exercise by reading the plain text in your character vector into a data frame. Use read table () to do this. The parameters, text, skip, header, and sep should be useful here.