I will put code, notes, R sessions from live coding in class, etc. from the course in this github repository and some also on Canvas.
-
- cumsum() and grouping lines
- Related to and might be useful for assignment 2 - Spam Email.
-
Lectures
-
- Slides
- Description of HTML tree.
- NYTimes map example
- Data in CSV via separate background download
- NYT.R
- Marine Traffic
- Data in JSON via separate background download
- Can't make simple HTTP request with readLines()
- Stats StackExchange
- static HTML content in Question summary front page
- Get links to page for each question.
- Firefox Developer Tools
- Rsession
-
- code to process search results/pages of question from stats.stackexchange.com
- This is a good structure for harvesting posts/questions/etc. when we have page after
page of search results.
In other words, consider using this structure for assignment 4, and specializing it
to craigslist. The components correspond to
- loop over pages and append the results
- process a page of results
- process each result, e.g., get URL for actual post/question
- get URL or HTML for the next page of results
- fix/post-process the columns in the overall data.frame
- This is a good structure for harvesting posts/questions/etc. when we have page after
page of search results.
In other words, consider using this structure for assignment 4, and specializing it
to craigslist. The components correspond to
- code to process search results/pages of question from stats.stackexchange.com
-
- Direct/Low-level Approach
- Discusses generating the SVG directly from R using R graphics (including ggplot2) and then annotating them from R using XPath and XML manipulation.
- Annotating SVG plots - succinct
- Stand-alone version of Animated Map
- This directly inlines the SVG documents in the HTML and avoids the cross-origin security issue which we dealt with previously by serving the files via a Web server.
- Annotating ggplot plots
- Direct/Low-level Approach