Neulhan/piro_crawling

피로그래밍 12기 크롤링 강의

🗺piro_crawling

print('피로그래밍 12기 크롤링 강의 페이지입니다.')

사용환경

jupyter notebook (.ipynb)
google colaboratory (.ipynb)

request

파이썬 코드를 통해서 웹 페이지에 HTTP 요청을 보냄

urllib

import urllib

urllib_case = urllib.request.urlopen(url)
html_text = urllib_case.read().decode("utf-8")

파이썬 binary 파일에 대해

requests

import requests

html_text = requests.get(url).text

# html_text 에는 str 형식의 html 문서가 담긴다

urllib vs requests 정리된 블로그

bs4.Beautifulsoup

beautifulsoup란 무엇인지에 대해 잘 정리된 블로그

from bs4 import BeautifulSoup as bs

# beautiful soup 객체 생성
soup = bs(html_text, 'html.parser')

# html 안에서 선택자를 통해 특정 태그들 가져오기
selected_elements = soup.select('selector')

# 가져온 태그들 활용하기
# 1. .text로 내용 추출
# 2. .attrs