/Web-Scraping-in-R

Coursebook for Algoritma Data Science Stack: Static and Dynamic web Scraping using R

Web-Scraping-in-R

Coursebook for Algoritma Data Science Stack: Static and Dynamic web Scraping using R

The primary objective of this course is to provide a participant a comprehensive introduction about tools and software for web scraping using the popular open-source tools: R. The material will covers:

Introductory Module:

  • Tools Introduction
    • R and R Studio
    • Open source packages
    • Using R Markdown
    • R Programming Basics
  • Data Wrangling with R's tidyverse
    • Working with tabular data in R: Tips and Techniques
    • Data Wrangling using dplyr
    • Introduction to stringr and stringi for text manipulation

Main Module:

  • Legality of Web Scraping
    • Website terms and condition
    • robots.txt as regulation of the website about web crawling
  • How web scraping work in general
    • Intro to HTML and CSS
    • Web scraping workflow
  • Scraping data from non java-scripted website using rvest
    • Hands-on web scraping using rvest
    • Using css selector
    • Build looping code for multiple pages
  • Scraping data from java-scripted website and building browser bot using rselenium
    • Hands-on web scraping using Rselenium
    • The difference between Rselenium and rvest ability
    • The interactive of Rselenium
    • Build looping code for multiple pages and input
  • Exploratory Data Analysis
    • Wrangling Scraped data
    • Exporting data
    • Optional : Example of a project using web scraping capabilities to deliver insightful knowledge