1st-PyCrawlerMarathon

Day Contents Remarks
day 001 download file
file I/O
day 002 csv file handling
day 003 xml file handling
day 004 API POKE
day 005 API + JSON
day 006 Headers
day 007
day 008 Static Webpage Crawling
day 009 download images
day 010 Packages: PyQuery/grab
day 011 Regular Expression
day 012 Ex. ETtoday
day 013 Ex. PTT
day 014 Ex. Yahoo! movie
day 015 Ex. Bank of Taiwan
day 016 Ex. Wiki recursive scrawling
day 017
day 018 about "headers"...
day 019 Ex. ETtoday selenium + beautifulsop
day 020 API operation
day 021 Ex. ETtoday Active Web Pages
day 022 Ex. Air Quality Website
day 023 Ex. ETtoday.net Get external website content
day 024 Ex. 104 HR
day 025 Scrapy Intro. no HW
day 026 Scrapy: Request
day 027 Scrapy: XPath + Itempipeline
day 028 Scrapy: API
day 029 Scrapy: multi webpage
day 030 some challenges
day 031 headers
day 032 captcha
day 033 login
day 034 proxy IP
day 035 multithread
day 036 asyncronized
day 037 scheduled