Coursebook for Algoritma Data Science Stack: Static and Dynamic web Scraping using R
The primary objective of this course is to provide a participant a comprehensive introduction about tools and software for web scraping using the popular open-source tools: R. The material will covers:
Introductory Module:
- Tools Introduction
- R and R Studio
- Open source packages
- Using R Markdown
- R Programming Basics
- Data Wrangling with R's
tidyverse
- Working with tabular data in R: Tips and Techniques
- Data Wrangling using
dplyr
- Introduction to
stringr
andstringi
for text manipulation
Main Module:
- Legality of Web Scraping
- Website terms and condition
- robots.txt as regulation of the website about web crawling
- How web scraping work in general
- Intro to HTML and CSS
- Web scraping workflow
- Scraping data from non java-scripted website using rvest
- Hands-on web scraping using
rvest
- Using css selector
- Build looping code for multiple pages
- Hands-on web scraping using
- Scraping data from java-scripted website and building browser bot using rselenium
- Hands-on web scraping using
Rselenium
- The difference between
Rselenium
andrvest
ability - The interactive of
Rselenium
- Build looping code for multiple pages and input
- Hands-on web scraping using
- Exploratory Data Analysis
- Wrangling Scraped data
- Exporting data
- Optional : Example of a project using web scraping capabilities to deliver insightful knowledge